UW Biiostatistics & Medical Informatics UW Biostatistics & Medical Informatics UW Madison UW Biostatistics & Medical Informatics Site Map
 
UW School of Medicine and Public Health UW Madison






Educational Resources
Computational Biology and Biostatistics Summer Research Program
- Research Projects - 2004-

The following research topics were from the Summer 2004 Research Program in Biostatistics. Research project descriptions from previous years are also available. Next year's program will probably have different topics, but on the same level as the ones listed below. Click on title of project for further information on each research project.

Mentor(s) Title of Project
Tom Cook Assessing the effect of stratification on balance and efficiency in studies with stratified, blocked randomization
Jens Eickhoff Measuring inter-rater agreement between pathologists in oncology studies
Christina Kendziorski and Timothy Grant Empirical Bayes methods for matched microarray data
Rebecca Koscik Cognitive outcomes in a randomized trial of raloxifene
Hui-Chuan Lai Analysis of nutrient intake data from 3-day food records
David Page Machine learning for drug design
Brian Yandell Statistical genomics: microarray gene mapping in mice



Title of Project: Assessing the effect of stratification on balance and efficiency in studies with stratified, blocked randomization
Mentor(s) involved: Tom Cook
Objective of project: To assess the effect that the number and size of strata have on the balance and efficiency in studies with stratified, blocked randomization.
Knowledge/Skills needed:: Basic probability.
Knowledge/Skills obtained: Understanding the mechanics of blocked randomization, some elementary tests of hypothesis (Pearson chisquare, t-test, Wilcoxon test) and how their performance is related to randomization scheme.
Abstract: Randomization is essential for assessing treatment effects in controlled experiments. Randomization achieves several goals: ensuring that there are no systematic differences between treatment groups, ensuring that sample sizes in each treatment group are approximately the same, and providing a theoretically rigorous basis for statistical inference. Blocked randomization can be used to increase the degree to which balance is achieved; however, if stratification is used, the balance is decreased with increasing number of strata. We will explore the impact that the number of strata has on balance and on the operating characteristics of the corresponding statistical tests.

Title of Project: Measuring inter-rater agreement between pathologists in oncology studies
Mentor(s) involved: Jens Eickhoff
Objective of project: To explore latent variable modeling as a tool for the analysis of rater agreement for ordinal category ratings.
Knowledge/Skills needed:: Basic concepts of calculus, probability and statistics.
Knowledge/Skills obtained: Understanding of latent variable modeling. Computational methods in statistics, e.g., computations associated with maximum likelihood estimation problems: EM algorithm, evaluation of multiple integrals, and optimizations.
Abstract: In many oncology studies, tumor tissues are examined and categorized (e.g., benign, malignant) by at least two pathologists. Consequently, it is oftentimes of interest to quantify the magnitude of agreement between the pathologist ratings. A common way to measure rater agreement is to use the kappa coefficient. However, the use of such a measure as become increasingly controversial, in part because the kappa coefficient is unnecessarily stringent in crediting so much of observed agreement to chance; if certain categories predominate, seemingly good agreement can still result in low values of the kappa coefficient.

In this project, latent variable models will be utilized to measure inter-rate agreement. Latent variable models are intuitively appealing in this framework and furnish meaningful interpretation and statistical inference. Specifically, parameter estimates can be used to quantify inter-rater variability and rater classification thresholds. This project will involve simulation studies as well as analysis of real data.

Title of Project: Empirical Bayes methods for matched microarray data
Mentor(s) involved: Christina Kendziorski and Timothy Grant
Objective of project: Extend empirical Bayes methods for identifying differentially expressed genes to matched data.
Knowledge/Skills needed:: A working knowledge of R, introductory statistics, and any experience in Bayesian methods would be useful.
Knowledge/Skills obtained: The student should develop a basic understanding of microarray data and statistical methods for identifying differentially expressed genes. The general statistical methods include those used in two (or more) group comparisons (t-tests, ANOVA, mixture models) and methods used to account for multiple tests (Bonferroni, permutation based corrections, Bayesian methods).
Abstract: DNA microarrays allow for large scale coordinate monitoring of gene expression. In the late 1990's, they were referred to as the "first great hope" for providing global views of biological processes (Nature Genetics Chipping Forecast, 1999). The enthusiasm was not misguided. Microarrays are now the most widely used tool to efficiently measure an organism's gene expression levels.

Microarray experiments result in tens of thousands of measurements for a single individual and, most often, groups of individuals are studied across multiple biological conditions. The complexities of the data structure pose new challenges for the statistician. A number of statistical methods have been developed to identify genes differentially expressed between two conditions. However, no methods explicitly address the situation where the data is matched. Accounting for the matched structure could lead to much needed increases in sensitivity.

In the summer of 2004, we will extend our empirical Bayes methods for identifying differentially expressed genes to account for matched data. We will also consider simple extensions of other approaches to allow for matching. The operating characteristics of the approaches will be assessed using simulations and data from the laboratories of Drs. Gould and Ahlquist. The Gould lab studies mammary cancer and has collected microarray data from control and RXR treated rats that have been pair fed to control for treatment induced weight loss. The Ahlquist lab is studying nasal cancer and has microarray data from both healthy and cancerous tissues of individual patients.



Title of Project: Cognitive outcomes in a randomized trial of raloxifene
Mentor(s) involved: Rebecca Koscik
Objective of project: The objectives of this project are to:
  1. become familiar with various measures of cognitive functioning;
  2. evaluate the appropriateness of various analysis approaches for this data set, such as 2-sample t-test, 2-sample Wilcoxon, Generalized Estimating Equations, and Hotelling’s T-test;
  3. evaluate the effects on cognitive functioning of Raloxifene relative to placebo in a sample of women with probable Alzheimer’s disease; and
  4. depict key outcomes graphically.
Knowledge/Skills needed:: Introductory statistics course plus knowledge of a statistical software package. Interest in cognitive function data and/or treatment of Alzheimer’s disease.
Knowledge/Skills obtained: Understanding of types of cognitive outcomes and relevance in Alzheimer’s research. Ability to select appropriate methods for analyzing longitudinal data and multiple outcomes. Ability to use statistical software to graph data.
Abstract: Background: Recent findings from the Women’s Health Initiative (WHI) raise serious concerns about the safety and feasibility of prolonged therapy with opposed conjugated estrogen. Consequently, there is a critical need to identify alternatives to traditional hormone therapy. One such alternative, raloxifene, is a selective estrogen receptor modulator (SERM) used commonly for the treatment of osteoporosis, and known to exhibit neuromodulatory and neurotrophic actions in the brain. Further, raloxifene has the potential to enhance cognitive function of healthy older women. However, no clinical study has evaluated whether administration of raloxifene could improve memory or other cognitive symptoms of postmenopausal women with Alzheimer’s disease (AD).

Objective: To determine if short-term administration of raloxifene would improve performance on a comprehensive battery of tests of cognition and mood for post-menopausal women with AD.

Methods: The present ongoing randomized, placebo-controlled, double-blind, parallel-group design clinical study evaluates the cognition-enhancing efficacy of 120mg/day of raloxifene in post-menopausal women with probable AD. The study involves three months of therapy with raloxifene, and comprehensive evaluation of cognition and mood at baseline and weeks 4, 8, and 12 on treatment and again at week 20 following 2 months of treatment termination.



Title of Project: Analysis of nutrient intake data from 3-day food records.
Mentor(s) involved: Hui-Chuan Lai
Objective of project: The purpose of this project is to become familiar with analyzing assessing dietary intake data and to understand the sources and types of error commonly encountered when utilizing dietary data to summarize a population’s nutrient intake.
Knowledge/Skills needed:: Introductory statistics course plus knowledge of statistical software package.
Knowledge/Skills obtained: Ability to analyze dietary data, identify important sources of error, and critically evaluate studies of diet/disease associations. Knowledge of Dietary Reference Intakes (DRI) and how they are used to evaluate population nutrient intake.
Abstract: Epidemiologic investigations of diet and various measures of health-related outcomes are common in the scientific literature. However, obtaining accurate and reliable dietary data, that is representative of usual intake for the time period of interest, is challenging. Conducting and/or reading about dietary analyses require an understanding of the potential sources of error involved in utilizing dietary data. Knowledge of the Dietary Reference Intakes (US nutrient intake standards) is also essential. Dietary assessment tools vary; dietary data in this project come from 3-day food records. Potential sources of error may include (1) under- or over-reporting of food intake by study participants (intentional or unintentional), (2) inadequate information in the nutrient database for unusual food items, (3) differing data entry of similar food items by study personnel, (4) variation in intake over the 3 days, and (5) inter-individual differences in intake that may differ according to other factors (i.e. gender, age, body weight). Another important consideration in analyzing dietary data is whether nutrient intake should account for total calorie intake. These issues will be explored with selected nutrients in a sample of healthy, college-aged subjects.


Title of Project: Machine learning for drug design
Mentor(s) involved: David Page
Objective of project: The objective is to identify structural properties of molecules that are responsible for their pharmacological activities, in order to help researchers design new drugs.
Knowledge/Skills needed:: Knowledge of computer programming is necessary. Familiarity with chemistry, logic, and/or Prolog programming language is helpful but not required.
Knowledge/Skills obtained: Familiarity with chemistry, logic, and/or Prolog programming language will be obtained through the project, as will some knowledge of machine learning, data mining and the drug design process.
Abstract: This project will involve applying a machine learning algorithm to determine structural properties common to molecules that work as anti-cancer agents. The student(s) will use a molecular modeling software package to draw the molecules and estimate their three-dimensional structures. They then will label the molecules as "active" or "inactive" according to chemical tests performed robotically (though the student(s) will not be involved in running these tests). Finally, the student(s) will run the machine learning algorithm to produce a model that predicts anti-cancer activity from molecular structure. Cross-validation will be used to estimate the accuracy of the model(s) produced by this approach.


Title of Project: Statistical genomics: microarray gene mapping in mice
Mentor(s) involved: Brian Yandell
Objective of project: Interval mapping of multiple gene loci (QTL) for massive microarray data in experimental crosses
Knowledge/Skills needed:: Some ability with R
Knowledge/Skills obtained: Further training with R, R libraries, massive data, graphical diagnostics
Abstract: I have been working closely with the laboratory of Prof. Alan Attie in Biochemistry on a mouse model for diabetes and obesity. Recently Christina Kendziorski joined our team. We now have an F2 genetic cross with 60 mice and over 40,000 measurements per mouse. These measurements are mRNA gene expression abundance with Affymetrix chips. They represent chemical signals inside liver tissue at the time of mouse sacrifice. Differences among mice may shed light on the biochemical basis of diabetes and obesity. The key question: what are the primary genetic factors, or genomic regions, that influence these mRNA expression measurements? Our studies and those of others suggest that the story is not simple, but we are beginning to get some clues. There are interesting statistical questions at a variety of levels, depending on a student’s abilities and inclination.

 

CBB Research Projects Index: 2001, 2002, 2003, 2004, 2005, 2006, 2007

 

 

 

Uw Madison, Chemistry Lab UW Madison, Class held outside on Bascom Hill

Internal Use | Site Map | Search |
Overview | People | Training | Research | Seminars | Employment | Links |
Biostatistics Program | Clinical Trials Program | Medical Informatics Program | Biomedical Computing |

Copyright © 2006 The Board of Regents of the
University of Wisconsin System

 

UW Madison UW School of Medicine and Public Health