2003 RESEARCH PROJECTS
Project Title: 
An Introduction to Statistical Simulation and Random Walk in Brain Imaging Analysis 
Mentor(s): 
Moo Chung 
Abstract: 
MonteCarlo simulation is a powerful computational technique that can be used to solve some analytically intractable hard problems numerically. In medical image analysis, in many situations, it is difficult to estimate some parameters analytically. In such case, MonteCarlo simulation can be used to estimate the parameters. In the magnetic resonance imaging (MRI), diffusion tensor imaging (DTI) techniques can measure the direction of the white fibers that connect one brain regions to other brain regions. The exact computation of the probability of connection based on DTI is intractable. In this situation, the MonteCarlo random walk approach can be used to estimate the probability of the brain connectivity. We will study basics on MonteCarlo simulation and random walks and implement MATLAB programs to map the whole brain white fiber connectivity probabilistically. 
Project Title: 
A machine learning approach for the development of richer probabilistic sequence models 
Mentor(s): 
Mark Craven and Joe Bockhorst 
Abstract: 
Probabilistic sequence models (PSMs) are a popular method for representing classes of signals found in DNA and amino acid sequences. These models represent a probability distribution over sequences where high probability sequences are likely to belong to the class being modeled and low probability sequences are not. Typically, the form of the probability distribution is decided prior to analysis. For example, each position in the model may be assumed to be independent of all other positions. We intend to explore machine learning methods for automatically constructing the structure of PSMs (and thus the form of its probability distribution) for various classes of sequence signals. We will apply our PSMs to the task of recognizing short DNA sequences that are involved in controlling gene regulation. 
Project Title: 
Mapping Breast Cancer Incidence Rates in Wisconsin 
Mentor(s): 
Ron Gangnon 
Abstract: 
Maps of regional disease rates are useful tools in identifying spatial patterns of disease. Maps based on raw disease rates can be difficult to interpret, because the observed extreme rates, both high and low, will tend to found, by chance, in areas with small populations (high variance). A number of alternative estimators of the regional disease rates have been proposed to account for this problem. Many of the proposed estimators are based on Bayes or empirical Bayes methods. The goal of this project is to produce a useful map, or more likely a useful series of maps, to summarize breast cancer incidence rates across Wisconsin. The data set consists of the number of incident cases of breast cancer by zip code during the years 19881992 and an ageadjusted expected number of incident cases by zip code for the same period. 
Project Title: 
Testing for timetoevent differences in clinical studies 
Mentor(s): 
Michael Kosorok 
Abstract: 
Comparing timetoevent data between placebo and treatment is a common and important method of checking whether a new drug is effective. Weighted logrank statistics are frequently the tests of choice in these situations. Unfortunately, if the wrong weighted logrank test is chosen, a significant treatment effect can be missed. Adaptive logrank tests learn from the data and have been shown to be able to pick up subtle treatment effects which might be missed by standard weighted logrank statistics. This project involves learning about and computing several of these adaptive statistics and, if time permits, writing programs which compute sample size formulas which can be used to design flexible clinical trials with timetoevent outcomes.

Project Title: 
A study of biodiversity indices 
Mentor(s): 
Michael Newton 
Abstract: 
This problem comes up in a study of DNA damage in cancer cells. An adequate solution will help in characterizing variability in this damage, and this may be useful in understanding basic tumor biology. The goal of the summer project is to construct and study the properties of an improved diversity index which will behave more smoothly as one moves between different underlying levels of diversity. This will provide helpful methodology for the statistical analysis of molecular data arising in cancer. It will also be applicable in other studies of biodiversity. 
Project Title: 
Assessing the Performance of Measures of Diversity in Ecology 
Mentor(s): 
Rick Nordheim 
Abstract: 
An important ecological concept is species diversity. Examples to which this concept can be applied include birds in a forest or bacteria in the stomach. Certain distributions are used to describe the pattern of population size of species and the number of species. Of particular interest are certain indices that are used to summarize diversity. The most common one of these is probably the Shannon Index which is related to entropy ideas. Most ecological studies provide estimates of such diversity indices without consideration of the uncertainty in their estimation and how sampling strategies  and the underlying species distribution  affect the estimates. We will explore various ways of quantifying the performance of these diversity indices. 
Project Title: 
Cost of Smoking Cessation Treatment in the WI Medicaid Population 
Mentor(s): 
Margie Rosenberg 
Abstract: 
As a majority of smokers quit for short intervals, the crucial clinical battle is to prevent relapse of tobacco use among these smokers. Uses of prescription drugs such as Zyban, Nicotrol NS or Inhaler, or the Nicotine patch have been shown to be effective to prevent relapse among tobacco users. Providing insurance coverage for tobacco dependence treatment is a health policy intervention that has been shown to increase treatment use and to reduce smoking prevalence within defined populations. In this study we examine data from the State of WI Medicaid Program. We have twoyears of data separated by fee for service and HMO coverages. Our task is to determine the patterns of tobacco dependence treatment by type of coverage and examine the cost for providing the drug treatment within these two populations. 
2002 RESEARCH PROJECTS
Project Title: 
Adult Life Expectancy 
Mentor(s): 
Rick Chappell 
Abstract: 
Life expectancy, the average age at death, is of course a common and important way of summarizing health in a country or region. It has many components. However, breaking mortality rates down into their various causes and contrasting them between regions can be lengthy and difficult. Furthermore, the data may be absent in some parts of the world; even when causes of death are present, diagnoses can differ. One of the simplest and least vague subdivisions of mortality is with respect to age. Infant/child mortality, a frequently studied quantity, is complemented by adult life expectancy (ALE), which is not as wellknown. The ALE is defined as the expected age of death among those who live to at least 16 years (other cutoffs may be used without greatly affecting results). 
Project Title: 
Multiple Endpoints in Group Sequential Clinical Trials 
Mentor(s): 
Tom Cook 
Abstract: 
In many randomized controlled clinical trials, treatment comparisons for efficacy endpoints are performed at several times while the trial is ongoing with the potential for early termination if compelling evidence of efficacy emerges. However, as the number of analyses increases, so does the probability of a type I error (false positive), so stopping criteria must be formulated so that the overall type I error rate is controlled. Such designs are known as group sequential designs. If a common failuretime endpoint is used for all analyses, the stopping boundary can be calculated using standard methods. If the primary endpoint is a composite (say the first occurrence of either death or a nonfatal coronary event) and a subset of the primary endpoint (say, death alone) is used for interim analyses, then the correlation structure between test statistics is more complex and standard methods may yield incorrect results. In most trials, however, standard methods should provide an adequate approximation. This project will use simulation techniques to evaluate the adequacy of this approximation. 
Project Title: 
Nonparametric Tests for Clustered Data 
Mentor(s): 
Ron Gangnon 
Abstract: 
Clustered data occur frequently in studies in ophthalmology, otolaryngology and dentistry, when a single type of measurement is taken from two or more distinct, similar units within a common subject. For example, in ophthalmology, visual acuity could be assessed separately on both eyes of a person. Ignoring the dependence between observation from the same cluster within the data can result in invalid inferences. In this project, we will explore two possible derivations of the nonparametric WilcoxonMannWhitney test for independent (unclustered) data and how these derivations could be adapted for clustered data. We will then evaluate the performance of several possible WilcoxonMannWhitney tests for clustered data through a series of simulation studies. 
Project Title: 
Markov Chain Monte Carlo Analysis of Gene Expression Data 
Mentor(s): 
Michael Kosorok 
Abstract: 
Bayesian networks have been used to model gene expression data, to determine which genes influence the expression of other genes. A Bayesian network is a directed graph without cycles that represents a probability distribution. Recent approaches have used greedy search algorithms to identify Bayesian network structures that fit the data well. Some extensions can handle hidden variables. These approaches all produce a most plausible structure, from which gene expression patterns can be queried. The purpose of this project is to test whether the MetropolisHastings algorithm can be used to obtain more accurate answers. 
Project Title: 
An Introduction to Measures of Similarity as Used in Ecology 
Mentor(s): 
Rick Nordheim 
Abstract: 
Measures of similarity are concepts used by ecologists to describe the relationship between ecological communities in terms of distribution of species. For example, such measures can be used to compare the similarity between two forest stands in terms of the presence/absence of key tree species. Most ecological measures are descriptive; relatively little effort has been made to understand the meaning of the quantities and to conduct any statistical inference with them. We will explore various efforts at quantification of such measures. 
Project Title: 
Learning Dynamic Bayesian Networks from Timeseries Data 
Mentor(s): 
David Page 
Abstract: 
Bayesian networks can provide a compact representation of a joint probability distribution over many variables. A popular area of research is the design of algorithms to infer Bayesian networks from data. It is common to interpret an arc in such a learned Bayesian network as denoting "causality," but such an interpretation is not justified. More insight into causality can be gleaned from timeseries data. Dynamic Bayesian networks are a form of Bayesian network that is wellsuited to timeseries data. But questions remain about how much data is required to accurately learn a dynamic Bayesian network. This project seeks to answer these questions empirically. 
Project Title: 
An Introduction to Experience Analysis and Pricing of Life Insurance 
Mentor(s): 
Margie Rosenberg 
Abstract: 
The University of Wisconsin System requires faculty and staff to participate in a term insurance program that would pay a specified amount in the event of the insured's death. Each year $24 is automatically subtracted from the employee's check to pay for the insurance. Insurance programs periodically conduct "experience studies" to determine whether the premium charged is adequate to cover the benefits and expenses of the program. The objective of this project is to update the last experience study that was conducted in the summer of 1995. We will be analyzing the trends in the mortality rates, causes of death, and other factors that could impact the adequacy of the rate structure. 
2001 RESEARCH PROJECTS
Project Title: 
Usefully interpreting the triplescreen prenatal assay for birth defects 
Mentor(s): 
Rick Chappell 
Abstract: 
The triple screen assay is a blood test given to pregnant women in order to determine their risk for fetal defects, in particular chromosomal abnormalities such as Downs' syndrome. It is midway in complexity between two other types of information. The simplest is the potential mother's age: risk of Downs' syndrome rises dramatically with maternal age, from nearly nonexistent at age 20 to one in fifteen at age 45. In general, however, age is not an accurate predictor. The most precise and simultaneously most invasive way to predict Downs' syndrome is through amniocentesis in which fluid is taken from the amniotic sac via a needle (a related invasive method is also used). The triple screen provides a fairly accurate predictor through a simple blood test. All of these methods are used in order to give parents an advance warning of fetal birth defects and perhaps allow them to terminate the pregnancy. 
Project Title: 
Sensitivity of Adjusted Survival Estimates to Errors in Adjudication Rates 
Mentor(s): 
Tom Cook 
Abstract: 
In many clinical trials, it is necessary to estimate and compare eventfree survival for the treatment groups involved. The events of interest may be, for example, myocardial infarction, stroke, or other cardiovascular events requiring hospitalization. Furthermore, these events may require a classification committee to determine whether or not events reported by the treating physicians meet the criteria established for these events (adjudication). When events have been reported but have yet to be classified, we may use previously adjudicated events to estimate the rates at which reported events meet criteria, and in turn use these estimates to adjust estimates of eventfree survival. Since estimated rates are used, error may be introduced into the survival estimates. We will investigate the extent to which these errors affect the estimates of eventfree survival. 
Project Title: 
Categorizing Visual Acuity Patterns in the Early Treatment Diabetic Retinopathy Study 
Mentor(s): 
Ron Gangnon 
Abstract: 
The Early Treatment Diabetic Retinopathy Study (ETDRS) was a clinical trial of early photocoagulation v. deferral of photocoagulation for the treatment of diabetic retinopathy and diabetic macular edema. Data from this study has been used to develop scales for the severity of diabetic retinopathy and diabetic macular edema. During the trial, visual acuity measurements were recorded at baseline and subsequently every 4 months. The goal of this project is to find meaningful clusters of visual acuity scores in eyes assigned to deferral of photocoagulation over the 1^{st} three years of the ETDRS. A secondary goal is to determine if particular visual acuity patterns are associated with specific morphologic (severity of diabetic retinopathy or severity of diabetic macular edema) changes. 
Project Title: 
An Introduction to Biomathematical Modeling 
Mentor(s): 
Christina Kendziorski 
Abstract: 
Basic mathematical models of population dynamics have proven useful in areas such as tumor cell modeling. Such models can be used to determine optimal treatment protocols, estimate tumor growth rates, and identify factors affecting changes in tumor cell population number. The simplest models are well understood and characterized. However, conceptually straightforward extensions can result in complexities that make closed form solutions difficult or impossible to obtain. In this project, we will review basic differential equation models of population dynamics along with both analytical and numerical methods of solving DEs. Time permitting, we will extend standard models to address questions in a study of rat mammary tumor cell populations. 
Project Title: 
Markov Chain Monte Carlo Analysis of Gene Expression Data 
Mentor(s): 
Michael Kosorok and
David Page 
Abstract: 
Bayesian networks have been used to model gene expression data, to determine which genes influence the expression of other genes. A Bayesian network is a directed graph without cycles that represents a probability distribution. Recent approaches have used greedy search algorithms to identify Bayesian network structures that fit the data well. Some extensions can handle hidden variables. These approaches all produce a most plausible structure, from which gene expression patterns can be queried. The purpose of this project is to test whether the Metropolis Hastings algorithm can be used to obtain more accurate answers. 
Project Title: 
An Introduction to the Analysis of Functional Data 
Mentor(s): 
Mary Lindstrom 
Abstract: 
"Functional data" is a relatively new statistical term for a broad class of data. Data are functional when each of the main sampling units (humans, rats, machines, slabs of metal) produces a data curve. A common example is data gathered over time. Each subject receives a preassigned treatment and then a response (say blood level of a particular compound) is recorded at a series of time points. The data from each subject is then a set of (time, response) points that can be plotted as a curve. Typical goals in analyzing functional data include estimating the "typical curve" for the population from which the sample of individuals was drawn, estimating individual curves, and testing for differences between treatment groups.
In this project we will summarize the one or more methods currently used to analyze functional data and describe their strengths and weaknesses. 
Project Title: 
An Introduction to Parametric Distributions Used in Survival Analysis and Loss Modeling 
Mentor(s): 
Margie Rosenberg 
Abstract: 
Survival analysis involves the study of time until event data (like time until death) or the study of cost data (cost until death or cost until discharge from a hospital). These analyses are useful in public policy studies to examine whether a new treatment is more costeffective than a standard treatment, or in insurance studies to investigate the impact of a change in policy deductibles or policy limits. The use of parametric models in analyzing time or cost until event data is advantageous as the models are simple to explain, produce a smooth function, and allow inferences to be made beyond the sample data. We will explore differences between various parametric models used in survival analysis both analytically and in the estimation of parameters for sample data sets. 
2009  2004 RESEARCH PROJECTS
To go back to the previous page listing 2009 through 2004 Research Projects please click the link below:
2009  2004 Research Projects
