Bayesian Hidden Markov Modeling of Array-CGH Data

Subharup Guha
Harvard School of Public Health
Biostatistics/Statistics Joint Faculty Hire Candidate

Monday, March 5, 2007

5275 MSC



Genomic alterations have been linked to the development and progression of cancer. The microarray-based technique of Comparative Genomic Hybridization (CGH) yields data consisting of fluorescence intensity ratios of test and reference DNA samples. The intensity ratios potentially provide information about changes in the number of copies of DNA. As increasing amounts of array-CGH data become available, there is a need for automated algorithms for characterizing genomic profiles. Specifically, there is a need for statistical algorithms that can identify gains and losses in the number of copies of DNA, rather than merely detect trends in the data.

We adopt a Bayesian approach, relying on the hidden Markov model to account for the dependence in the intensity ratios. Posterior inferences are made about gains and losses in copy number. Localized changes such as amplifications (associated with oncogene mutations) and deletions (associated with mutations of tumor suppressor genes) are identified based on posterior probabilities. Global changes such as extended regions of altered copy number are also detected by this method. Since the posterior distribution is analytically intractable, we implement a Metropolis-within-Gibbs algorithm for efficient simulation-based inference. Publicly available data on pancreatic cancer and glioblastoma multiforme are analyzed, and comparisons are made with some widely used array-CGH algorithms to illustrate the reliability and success of the technique.

