Educational Resources
Computational Biology and Biostatistics Summer Research Program
- Research Projects - 2006-
The following research topics were from the Summer 2007 Research Program in Biostatistics. Research project descriptions from previous years are also available. Next year's program will probably have different topics, but on the same level as the ones listed below. Click on title of project for further information on each research project.
Student
Mentor(s)
Title of Project
Samantha Bromfield
Moo Chung and Houri Vorperian
Functional Principal Component Analysis (PCA) in Vocal Tract Development
David Gasca
Marjorie Rosenberg
Smoking Cessation Benefit
Linda Liu
Rick Nordheim
Statistical Analysis of the Shannon Diversity Index
Corina Prieto
Julie Mitchell and Steve Darnell
Computational Analysis of Protein Interfaces
Nicholas Stong
George Phillips and Roman Aranda
3D Printing of Protein Models
Aline Thomas
Mark Craven and Keith Noto
Finding CRMs with Background Modeling
Title of Project:
Functional Principal Component Analysis (PCA) in Vocal Tract Development
Student
Samantha Bromfield
Mentor(s):
Moo Chung and Houri Vorperian
Abstract:
One objective of the Vocal Tract Development Lab is to characterize the anatomic growth of the vocal tract and surrounding structures as a first step towards understanding the biological basis of speech development. The vocal tract (VT) has been described as a 2 tube model or resonator with an anterior-oral portion in the horizontal plane (VT-H) and a posterior-pharyngeal portion in the vertical plane (VT-V). The growth of the vocal tract is non-uniform across sex, implying that the two portions may develop differently for males and females. The purpose of this statistical analysis was to assess the relational growth of the anterior versus posterior portions of the vocal tract as a function of sex. The comparison of sex differences was done using polynomial regression models. Three linear measurements were used in this analysis: 1) vocal tract length (VTL); 2) anterior vocal tract length (VT-H); and 3) posterior vocal tract length (VT-V). These measurements were secured from imaging studies (MRI and CT) of 229 cases (83 female and 146 male) between the ages birth to 20. Results indicate that there are significant sex differences in the growth rate and growth pattern of all three vocal tract measurements analyzed.
Title of Project:
Smoking Cessation Benefit
Student
David Gasca
Mentor(s):
Marjorie Rosenberg
Abstract:
Data was collected after the inclusion of a new insurance benefit for smoking cessation by the state of Wisconsin for the benefit of giving employers the pharmaceutical cost of such a benefit. Pharmacotherapy claims include use of one of the four FDA approved medications for smoking cessation. Subjects, for which individual-level claims was collected, included in the three-year study from 2001-2003 were the state of Wisconsin employees, retirees, and adult dependents. Single medication users were the concentration of the study, which made up 91% of all smoking cessation claimants, and comparisons were drawn of their individual activity within the pharmacotherapy plan to those of the clinical guidelines provided by the Agency for Healthcare Research and Quality (AHRQ).
Title of Project:
Statistical Analysis of the Shannon Diversity Index
Student
Linda Liu
Mentor(s):
Rick Nordheim
Abstract:
The purpose of this project was to assess the sampling properties of a widely used ecological measure, the Shannon diversity index. Results are presented from a study of the effects of sample size on the index, especially when applied to theoretical communities with different underlying species abundance distributions. The two distributions used n the study were geometric and lognormal, each with a range of parameters. We found that the response of the index to sample size depends on the shape of the distribution, but that for each distribution the bias decreased as sample size increased. We also analyzed the reliability of the t test used for comparing the Shannon indices for two samples. The study found that the realized performance of the t test reached closer agreement with the nominal performance (significance level of the test) for larger samples from each simulated community.
Title of Project:
Computational Analysis of Protein Interfaces
Student
Corina Prieto
Mentor(s):
Julie Mitchell and Steve Darnell
Abstract:
Protein-protein interactions are important for the understanding of biological pathways. The physical properties of protein-protein interfaces are one of the factors that determine its interaction behavior. Studying protein-protein interfaces, specifically “hot spot” residues that account for the majority of free energy in binding may reveal more information about their properties associated with binding. It is important to predict hot spots because they can help us design novel protein interfaces through a better understanding of their interaction behavior. We used machine learning to generate knowledge-based-decision-tree models that explore protein interfaces relative to flexibility and determined if this feature would improve the prediction of hot spots. However, important trends in the data were observed regarding the flexibility of hot spots based on whether they were hydrophobic or polar.
Title of Project:
3D Printing of Protein Models
Student
Nicholas Stong
Mentor(s):
George Phillips and Roman Aranda
Abstract:
The Protein Data Bank (PDB) was established in 1971 at the Brookhaven National Laboratory; in 1998 it was transferred to the Research Collaboratory for Structural Bioinformatics, which is composed of Rutgers University, The University of Wisconsin-Madison, NIST, and the San Diego Supercomputer Center. They maintain a variety of information on 37,136 structures, all of which are made publicly available. Scientists all over the world using a variety of techniques including x-ray crystallography and NMR spectroscopy determine this information experimentally. Most important is the 3-D model information available in .PDB files (Wikipedia). Using, Pymol, an open source molecular visualization system, this data can be viewed and manipulated. Using this program, proteins can be explored and better understood.
Title of Project:
Finding CRMs with Background Modeling
Student
Aline Thomas
Mentor(s):
Mark Craven and Keith Noto
Abstract:
Promoter regions of co-regulated genes often contain sets of protein binding sites called cis-Regulatory modules (CRMs). Probabilistic methods are often used in models that represent both binding site DNA and background DNA so that algorithms can find CRMs. In this project, an algorithm containing a two-state 5 th order hidden Markov model is compared to one containing a 5 th order Markov chain model to determine if a more sophisticated model will more accurately represent background DNA, and therefore, improve the ability of machines to find CRMs. Comparing their representation of DNA sequences, the hidden Markov model performed significantly better. However, the Markov chain model and the hidden Markov model found CRMs equally well with the motif finder.