UW Biiostatistics & Medical Informatics UW Biostatistics & Medical Informatics UW Madison UW Biostatistics & Medical Informatics Site Map
UW School of Medicine and Public Health UW Madison






Educational Resources
Computational Biology and Biostatistics Summer Research Program
- Research Projects - Summer 2002-

The following research topics were from the Summer 2002 Research Program in Biostatistics. Research project descriptions from 2001, 2003, 2004 and 2005 are also available.

Mentor(s) Title of Project
Rick Chappell Adult Life Expectancy
Tom Cook Multiple Endpoints in Group Sequential Clinical Trials
Ron Gangnon Nonparametric Tests for Clustered Data
Michael Kosorok Markov Chain Monte Carlo Analysis of Gene Expression Data
Rick Nordheim An Introduction to Measures of Similarity as Used in Ecology
David Page Learning Dynamic Bayesian Networks from Time-series Data
Margie Rosenberg An Introduction to Experience Analysis and Pricing of Life Insurance




Title of Project: Adult Life Expectancy
Mentor(s) involved: Rick Chappell
Duration of project: 4-weeks or 8-weeks
Objective of project: The purpose of this projects is to classify countries, using data from the World Health Organization, with respect to infant mortality and ALE, and to analyze the patterns induced by these two predictors. The student would help gather, interpret, and analyze the data. She or he will also help write up the results.
Abstract: Life expectancy, the average age at death, is of course a common and important way of summarizing health in a country or region. It has many components. However, breaking mortality rates down into their various causes and contrasting them between regions can be lengthy and difficult. Furthermore, the data may be absent in some parts of the world; even when causes of death are present, diagnoses can differ. One of the simplest and least vague subdivisions of mortality is with respect to age. Infant/child mortality, a frequently studied quantity, is complemented by adult life expectancy (ALE), which is not as well-known. The ALE is defined as the expected age of death among those who live to at least 16 years (other cutoffs may be used without greatly affecting results).




Title of Project: Multiple Endpoints in Group Sequential Clinical Trials
Mentor(s) involved: Tom Cook
Duration of project: 8-weeks
Objective of project: To perform numerical simulations to determine the adequacy of a simple approximation for maintaining type I error control in group sequential clinical trials with multiple composite failure time endpoints.
Knowledge/Skills needed: Basic understanding of simple parametric distributions and type I error probabilities.
Knowledge/Skills obtained: Basic understanding of type I error control in group sequential clinical trials. Introduction to log-rank test for differences between survival distributions. Using statistical programming software (R) to perform simulations to evaluate the performance of proposed procedures.
Abstract: In many randomized controlled clinical trials, treatment comparisons for efficacy endpoints are performed at several times while the trial is ongoing with the potential for early termination if compelling evidence of efficacy emerges. However, as the number of analyses increases, so does the probability of a type I error (false positive), so stopping criteria must be formulated so that the overall type I error rate is controlled. Such designs are known as group sequential designs. If a common failure-time endpoint is used for all analyses, the stopping boundary can be calculated using standard methods. If the primary endpoint is a composite (say the first occurrence of either death or a non-fatal coronary event) and a subset of the primary endpoint (say, death alone) is used for interim analyses, then the correlation structure between test statistics is more complex and standard methods may yield incorrect results. In most trials, however, standard methods should provide an adequate approximation. This project will use simulatio techniques to evaluate the adequacy of this approximation.




Title of Project: Nonparametric Tests for Clustered Data
Mentor(s) involved: Ron Gangnon
Duration of project: 4-weeks or 8-weeks
Objective of project: To evaluate the performance of nonparametric tests for clustered data.
Knowledge/Skills needed: Basic understanding of probability and random sampling. Prior knowledge of nonparametric statistics or statistical software desirable (not necessary).
Knowledge/Skills obtained: Basic concepts of hypothesis testing and nonparametric statistics; Basic understanding of statistical issues raised by clustered data; Derivation of the Wilcoxon-Mann-Whitney test for independent data; use of statistical software (S-plus or R).
Abstract: Clustered data occur frequently in studies in ophthalmology, otolaryngology and dentistry, when a single type of measurement is taken from two or more distinct, similar units within a common subject. For example, in ophthalmology, visual acuity could be assessed separately on both eyes of a person. Ignoring the dependence between observation from the same cluster within the data can result in invalid inferences. In this project, we will explore two possible derivations of the nonparametric Wilcoxon-Mann-Whitney test for independent (unclustered) data and how these derivations could be adapted for clustered data. We will then evaluate the performance of several possible Wilcoxon-Mann-Whitney tests for clustered data through a series of simulation studies.




Title of Project: Markov Chain Monte Carlo Analysis of Gene Expression Data
Mentor(s) involved: Michael Kosorok
Duration of project: 8 weeks
Objective of project: Determine whether Markov Chain Monte Carlo (MCMC) analysis gives an advantage over inducing the maximum likelihood Bayes Net when modeling gene expression data.
Knowledge/Skills needed: Computer programming, knowledge of data structures and algorithms.
Knowledge/Skills obtained: Knowledge of Bayes Nets, MCMC by Metropolis-Hastings, familiarity with gene expression data.
Abstract: Bayesian networks have been used to model gene expression data, to determine which genes influence the expression of other genes. A Bayesian network is a directed graph without cycles that represents a probability distribution. Recent approaches have used greedy search algorithms to identify Bayesian network structures that fit the data well. Some extensions can handle hidden variables. These approaches all produce a most plausible structure, from which gene expression patterns can be queried. The purpose of this project is to test whether the Metropolis-Hastings algorithm can be used to obtain more accurate answers.




Title of Project: An Introduction to Measures of Similarity as Used in Ecology
Mentor(s) involved: Rick Nordheim
Duration of project: 4 weeks
Objective of project: To compare the performance of different measures of similarity. Part of this will be based on simulation and part on examination of real data sets.
Knowledge/Skills needed: Basic familiarity with statistical ideas; some computer experience.
Knowledge/Skills obtained: Basic understanding of some of the mathematical/statistical ideas used for analyzing ecological data. Understanding of an approach for comparing performance of different measures.
Abstract: Measures of similarity are concepts used by ecologists to describe the relationship between ecological communities in terms of distribution of species. For example, such measures can be used to compare the similarity between two forest stands in terms of the presence/absence of key tree species. Most ecological measures are descriptive; relatively little effort has been made to understand the meaning of the quantities and to conduct any statistical inference with them. We will explore various efforts at quantification of such measures.




Title of Project: Learning Dynamic Bayesian Networks from Time-series Data
Mentor(s) involved: David Page
Duration of project: 4-weeks
Objective of project: To analyze how the accuracy of dynamic Bayes net learning varies with the number of data points, number of variables, and number of time steps.
Knowledge/Skills needed: Basic probability and computer programming.
Knowledge/Skills obtained: Knowledge of basic graph theory, Bayesian networks, dynamic Bayesian networks, machine learning, and expectiation-maximization (EM) algorithms.
Abstract: Bayesian networks can provide a compact representation of a joint probability distribution over many variables. A popular area of research is the design of algorithms to infer Bayesian networks from data. It is common to interpret an arc in such a learned Bayesian network as denoting "causality," but such an interpretation is not justified. More insight into causality can be gleaned from time-series data. Dynamic Bayesian networks are a form of Bayesian network that is well-suited to time-series data. But questions remain about how much data is required to accurately learn a dynamic Bayesian network. This project seeks to answer these questions empirically.




Title of Project: An Introduction to Experience Analysis and Pricing of Life Insurance
Mentor(s) involved: Margie Rosenberg
Duration of project: 4-weeks or 8-weeks
Objective of project: We will analyze UW system mortality data that is used to estimate annual mortality rates, and learn how the premiums are related to the benefits and expenses for the life insurance plan.
Knowledge/Skills needed: Basic analytical and computer skills.
Knowledge/Skills obtained: Knowledge of how mortality experience studies are created. Knowledge of how pricing for group life insurance is undertaken. We will probably do computer work using a combination of Excel and SAS depending on the nature of the data analyses completed.
Abstract: The University of Wisconsin System requires faculty and staff to participate in a term insurance program that would pay a specified amount in the event of the insured's death. Each year $24 is automatically subtracted from the employee's check to pay for the insurance. Insurance programs periodically conduct "experience studies" to determine whether the premium charged is adequate to cover the benefits and expenses of the program. The objective of this project is to update the last experience study that was conducted in the summer of 1995. We will be analyzing the trends in the mortality rates, causes of death, and other factors that could impact the adequacy of the rate structure.
See research descriptions from other past summers: 2001, 2003, 2004, 2005

 

Uw Madison, Chemistry Lab UW Madison, Class held outside on Bascom Hill

Internal Use | Site Map | Search |
Overview | People | Training | Research | Seminars | Employment | Links |
Biostatistics Program | Clinical Trials Program | Medical Informatics Program | Biomedical Computing |

Copyright © 2006 The Board of Regents of the
University of Wisconsin System

 

UW Madison UW School of Medicine and Public Health