UW Biiostatistics & Medical Informatics UW Biostatistics & Medical Informatics UW Madison UW Biostatistics & Medical Informatics Site Map
 
UW School of Medicine and Public Health UW Madison






Educational Resources
Computational Biology and Biostatistics Summer Research Program
- Research Projects - Summer 2003-

The following research topics were from the Summer 2003 Research Program in Biostatistics. Research project descriptions from previous years are also available. Next year's program will probably have different topics, but on the same level as the ones listed below. Click on title of project for further information on each research project.

Mentor(s) Title of Project
Moo Chung An Introduction to Statistical Simulation and Random Walk in Brain Imaging Analysis
Mark Craven / Joe Bockhorst A Machine Learning Approach for the Development of Richer Probabilistic Sequence Models
Ron Gangnon Mapping Breast Cancer Incidence Rates in Wisconsin
Michael Kosorok Testing for time-to-event differences in clinical studies
Michael Newton A study of Biodiversity Indices
Rick Nordheim Assessing the Performance of Measures of Diversity in Ecology
Margie Rosenberg Cost of Smoking Cessation Treatment in the WI Medicaid Population




Title of Project: An Introduction to Statistical Simulation and Random Walk in Brain Imaging Analysis
Mentor(s) involved: Moo Chung
Objective of project: To understand how computer simulation of randomness can be used in the estimation of the probability of brain connectivity in diffusion tensor image data.
Knowledge/Skills needed:: Basic understanding of normal distribution and computer programming. No need to know any specific computer language (the project requires writing computer codes in MATLAB, which is very easy to learn).
Knowledge/Skills obtained: Understanding of randomness, multivariate normal distribution functions, Markov chains, and Monte-Carlo simulation.
Knowledge of MATLAB programming and statistical computing. Basic understanding of medical image analysis and magnetic resonance imaging.
Abstract: Monte-Carlo simulation is a powerful computational technique that can be used to solve some analytically intractable hard problems numerically. In medical image analysis, in many situations, it is difficult to estimate some parameters analytically. In such case, Monte-Carlo simulation can be used to estimate the parameters. In the magnetic resonance imaging (MRI), diffusion tensor imaging (DTI) techniques can measure the direction of the white fibers that connect one brain regions to other brain regions. The exact computation of the probability of connection based on DTI is intractable. In this situation, the Monte-Carlo random walk approach can be used to estimate the probability of the brain connectivity. We will study basics on Monte-Carlo simulation and random walks and implement MATLAB programs to map the whole brain white fiber connectivity probabilistically.




Title of Project: A machine learning approach for the development of richer probabilistic sequence models
Mentor(s) involved: Mark Craven and Joe Bockhorst
Objective of project: To explore methods for automatically learning the structure
of probabilistic sequence models.
Knowledge/Skills needed:: Basic computer programming (Java or similar language), Basic understanding of probability
Knowledge/Skills obtained: Probabilistic sequence models, machine learning methods, Bayesian reasoning, Bayesian networks
Abstract: Probabilistic sequence models (PSMs) are a popular method for representing classes of signals found in DNA and amino acid sequences. These models represent a probability distribution over sequences where high probability sequences are likely to belong to the class being modeled and low probability sequences are not. Typically, the form of the probability distribution is decided prior to analysis. For example, each position in the model may be assumed to be independent of all other positions. We intend to explore machine learning methods for automatically constructing the structure of PSMs (and thus the form of its probability distribution) for various classes of sequence signals. We will apply our PSMs to the task of recognizing short DNA sequences that are involved in controlling gene regulation.




Title of Project: Mapping Breast Cancer Incidence Rates in Wisconsin
Mentor(s) involved: Ron Gangnon
Objective of project: To produce maps that summarize geographic variability in breast cancer incidence across Wisconsin.
Knowledge/Skills needed:: None.
Knowledge/Skills obtained: Understanding of Poisson models for event rates. Understanding of Bayes and empirical Bayes methods of parameter estimation. Useful posterior summaries based on exact calculations and simulations. Methods for mapping estimates and displaying uncertainty. Use of statistical and GIS software (R and ArcView).
Abstract: Maps of regional disease rates are useful tools in identifying spatial patterns of disease. Maps based on raw disease rates can be difficult to interpret, because the observed extreme rates, both high and low, will tend to found, by chance, in areas with small populations (high variance). A number of alternative estimators of the regional disease rates have been proposed to account for this problem. Many of the proposed estimators are based on Bayes or empirical Bayes methods. The goal of this project is to produce a useful map, or more likely a useful series of maps, to summarize breast cancer incidence rates across Wisconsin. The data set consists of the number of incident cases of breast cancer by zip code during the years 1988--1992 and an age-adjusted expected number of incident cases by zip code for the same period.




Title of Project: Testing for time-to-event differences in clinical studies
Mentor(s) involved: Michael Kosorok
Objective of project: To learn about methods used in clinical studies to detect differences between two groups based on time-to-event data (eg., time to recovery, time to death, time to cancer, time to cure, etc.), and to evaluate several new methods with real clinical trial data.
Knowledge/Skills needed:: Willingness to learn, program and use the statistical package R to perform analyses of both real and simulated data. Basic calculus, probability and statistics concepts.
Knowledge/Skills obtained: Knowledge of programming and manipulating data in R. Basic knowledge about survival analysis (time-to-event) statistical methods. Basic properties of weighted log-rank statistics and experience with sample size formulas. Some experience with analyzing and interpreting actual clinical trial data.
Abstract: Comparing time-to-event data between placebo and treatment is a common and important method of checking whether a new drug is effective. Weighted log-rank statistics are frequently the tests of choice in these situations. Unfortunately, if the wrong weighted log-rank test is chosen, a significant treatment effect can be missed. Adaptive log-rank tests learn from the data and have been shown to be able to pick up subtle treatment effects which might be missed by standard weighted log-rank statistics. This project involves learning about and computing several of these adaptive statistics and, if time permits, writing programs which compute sample size formulas which can be used to design flexible clinical trials with time-to-event outcomes.




Title of Project: A study of biodiversity indices
Mentor(s) involved: Michael Newton
Objective of project: To construct an index of biodiversity which has useful statistical properties regardless of the level of heterogeneity in the underlying population of length-n binary vectors. If successful, the resulting index will be used to characterize heterogeneity observed in the measurement of genomic lesions in cancerous tumors.
Knowledge/Skills needed:: Basic probability and statistics.
Knowledge/Skills obtained: Improved skills with discrete random variables, simulation, joint distributions, and measures of diversity; experience with statistical methods used in molecular data analysis.
Abstract: This problem comes up in a study of DNA damage in cancer cells. An adequate solution will help in characterizing variability in this damage, and this may be useful in understanding basic tumor biology. The goal of the summer project is to construct and study the properties of an improved diversity index which will behave more smoothly as one moves between different underlying levels of diversity. This will provide helpful methodology for the statistical analysis of molecular data arising in cancer. It will also be applicable in other studies of biodiversity.




Title of Project: Assessing the Performance of Measures of Diversity in Ecology
Mentor(s) involved: Rick Nordheim
Objective of project: To compare the performance of various measures of diversity with particular emphasis on the Shannon index. Much of this will be based on simulations. Careful attention will be devoted to different species abundance distributions.
Knowledge/Skills needed:: Basic familiarity with statistical ideas; some computer experience; interest in the interplay between statistical thinking and ecology.
Knowledge/Skills obtained: Experience in the interrelationships between biology and statistics; understanding how a meaningful simulation study is constructed; basic understanding of the statistical ideas underlying the analysis of ecological data.
Abstract: An important ecological concept is species diversity. Examples to which this concept can be applied include birds in a forest or bacteria in the stomach. Certain distributions are used to describe the pattern of population size of species and the number of species. Of particular interest are certain indices that are used to summarize diversity. The most common one of these is probably the Shannon Index which is related to entropy ideas. Most ecological studies provide estimates of such diversity indices without consideration of the uncertainty in their estimation and how sampling strategies -- and the underlying species distribution -- affect the estimates. We will explore various ways of quantifying the performance of these diversity indices.




Title of Project: Cost of Smoking Cessation Treatment in the WI Medicaid Population
Mentor(s) involved: Margie Rosenberg
Objective of project: To determine the costs of drug treatment for smoking cessation over
calendar year 2000 and 2001 using claims data.
Knowledge/Skills needed:: Organization skills to work with multiple complex data sets. Computer skills to summarize and analyze the data.
Knowledge/Skills obtained: Knowledge of State of WI Medicaid program. Ability to analyze and summarize claims data.
Abstract: As a majority of smokers quit for short intervals, the crucial clinical battle is to prevent relapse of tobacco use among these smokers. Uses of prescription drugs such as Zyban, Nicotrol NS or Inhaler, or the Nicotine patch have been shown to be effective to prevent relapse among tobacco users. Providing insurance coverage for tobacco dependence treatment is a health policy intervention that has been shown to increase treatment use and to reduce smoking prevalence within defined populations. In this study we examine data from the State of WI Medicaid Program. We have two-years of data separated by fee for service and HMO coverages. Our task is to determine the patterns of tobacco dependence treatment by type of coverage and examine the cost for providing the drug treatment within these two populations.

CBB Research Projects Index: 2001, 2002, 2003, 2004, 2005, 2006, 2007

 

 

Uw Madison, Chemistry Lab UW Madison, Class held outside on Bascom Hill

Internal Use | Site Map | Search |
Overview | People | Training | Research | Seminars | Employment | Links |
Biostatistics Program | Clinical Trials Program | Medical Informatics Program | Biomedical Computing |

Copyright © 2006 The Board of Regents of the
University of Wisconsin System

 

UW Madison UW School of Medicine and Public Health