UW Madison UW School of Medicine and Public Health

Computational Biology and Biostatistics
Summer Research Projects Index 2003 - 2001


2009 - 2004


Mentor(s) Project Title
Moo Chung An Introduction to Statistical Simulation and Random Walk in Brain Imaging Analysis
Mark Craven and Joe Bockhorst A Machine Learning Approach for the Development of Richer Probabilistic Sequence Models
Ron Gangnon Mapping Breast Cancer Incidence Rates in Wisconsin
Michael Kosorok Testing for time-to-event differences in clinical studies
Michael Newton A study of Biodiversity Indices
Rick Nordheim Assessing the Performance of Measures of Diversity in Ecology
Margie Rosenberg Cost of Smoking Cessation Treatment in the WI Medicaid Population

Project Title: An Introduction to Statistical Simulation and Random Walk in Brain Imaging Analysis
Mentor(s): Moo Chung
Abstract: Monte-Carlo simulation is a powerful computational technique that can be used to solve some analytically intractable hard problems numerically. In medical image analysis, in many situations, it is difficult to estimate some parameters analytically. In such case, Monte-Carlo simulation can be used to estimate the parameters. In the magnetic resonance imaging (MRI), diffusion tensor imaging (DTI) techniques can measure the direction of the white fibers that connect one brain regions to other brain regions. The exact computation of the probability of connection based on DTI is intractable. In this situation, the Monte-Carlo random walk approach can be used to estimate the probability of the brain connectivity. We will study basics on Monte-Carlo simulation and random walks and implement MATLAB programs to map the whole brain white fiber connectivity probabilistically.

Project Title: A machine learning approach for the development of richer probabilistic sequence models
Mentor(s): Mark Craven and Joe Bockhorst
Abstract: Probabilistic sequence models (PSMs) are a popular method for representing classes of signals found in DNA and amino acid sequences. These models represent a probability distribution over sequences where high probability sequences are likely to belong to the class being modeled and low probability sequences are not. Typically, the form of the probability distribution is decided prior to analysis. For example, each position in the model may be assumed to be independent of all other positions. We intend to explore machine learning methods for automatically constructing the structure of PSMs (and thus the form of its probability distribution) for various classes of sequence signals. We will apply our PSMs to the task of recognizing short DNA sequences that are involved in controlling gene regulation.

Project Title: Mapping Breast Cancer Incidence Rates in Wisconsin
Mentor(s): Ron Gangnon
Abstract: Maps of regional disease rates are useful tools in identifying spatial patterns of disease. Maps based on raw disease rates can be difficult to interpret, because the observed extreme rates, both high and low, will tend to found, by chance, in areas with small populations (high variance). A number of alternative estimators of the regional disease rates have been proposed to account for this problem. Many of the proposed estimators are based on Bayes or empirical Bayes methods. The goal of this project is to produce a useful map, or more likely a useful series of maps, to summarize breast cancer incidence rates across Wisconsin. The data set consists of the number of incident cases of breast cancer by zip code during the years 1988--1992 and an age-adjusted expected number of incident cases by zip code for the same period.

Project Title: Testing for time-to-event differences in clinical studies
Mentor(s): Michael Kosorok
Abstract: Comparing time-to-event data between placebo and treatment is a common and important method of checking whether a new drug is effective. Weighted log-rank statistics are frequently the tests of choice in these situations. Unfortunately, if the wrong weighted log-rank test is chosen, a significant treatment effect can be missed. Adaptive log-rank tests learn from the data and have been shown to be able to pick up subtle treatment effects which might be missed by standard weighted log-rank statistics. This project involves learning about and computing several of these adaptive statistics and, if time permits, writing programs which compute sample size formulas which can be used to design flexible clinical trials with time-to-event outcomes.

Project Title: A study of biodiversity indices
Mentor(s): Michael Newton
Abstract: This problem comes up in a study of DNA damage in cancer cells. An adequate solution will help in characterizing variability in this damage, and this may be useful in understanding basic tumor biology. The goal of the summer project is to construct and study the properties of an improved diversity index which will behave more smoothly as one moves between different underlying levels of diversity. This will provide helpful methodology for the statistical analysis of molecular data arising in cancer. It will also be applicable in other studies of biodiversity.

Project Title: Assessing the Performance of Measures of Diversity in Ecology
Mentor(s): Rick Nordheim
Abstract: An important ecological concept is species diversity. Examples to which this concept can be applied include birds in a forest or bacteria in the stomach. Certain distributions are used to describe the pattern of population size of species and the number of species. Of particular interest are certain indices that are used to summarize diversity. The most common one of these is probably the Shannon Index which is related to entropy ideas. Most ecological studies provide estimates of such diversity indices without consideration of the uncertainty in their estimation and how sampling strategies -- and the underlying species distribution -- affect the estimates. We will explore various ways of quantifying the performance of these diversity indices.

Project Title: Cost of Smoking Cessation Treatment in the WI Medicaid Population
Mentor(s): Margie Rosenberg
Abstract: As a majority of smokers quit for short intervals, the crucial clinical battle is to prevent relapse of tobacco use among these smokers. Uses of prescription drugs such as Zyban, Nicotrol NS or Inhaler, or the Nicotine patch have been shown to be effective to prevent relapse among tobacco users. Providing insurance coverage for tobacco dependence treatment is a health policy intervention that has been shown to increase treatment use and to reduce smoking prevalence within defined populations. In this study we examine data from the State of WI Medicaid Program. We have two-years of data separated by fee for service and HMO coverages. Our task is to determine the patterns of tobacco dependence treatment by type of coverage and examine the cost for providing the drug treatment within these two populations.


Any questions about the program or application procedures?

Please contact: Whitney Sweeney, Student Services Coordinator
Telephone: 608-262-9184 or Email:


Uw Madison, Chemistry Lab UW Madison, Class held outside on Bascom Hill
Last Modified:
Copyright © 2014 Board of Regents - University of Wisconsin System