Systems Biology Faculty Candidate Seminar
Probabilistic Graphical Models and Algorithms for Genomic Analysis
Eric Xing, PhD
Computer Science Division, UC Berkeley
Monday, April 19, 2004, 10:00 a.m.
Biotechnology Auditorium (Room 1111), 420 Henry Mall
I discuss two probabilistic modeling problems arising in metazoan genomic analysis: identifying motifs and cis-regulatory modules (CRMs) from transcriptional regulatory DNA sequences, and inferring haplotypes from genotypes of single nucleotide polymorphisms. Motif and CRM identification is important for understanding the gene regulatory network underlying metazoan development and functioning. I discuss a modular Bayesian model that captures rich structural characteristics of the transcriptional regulatory sequences and supports a variety of tasks such as learning motif representations, model-based motif and CRM prediction, and de novo motif detection. Haplotype inference is essential for the understanding of genetic variation within and among populations, with important applications to the genetic analysis of disease propensities and other complex traits. I discuss a Bayesian model based on a prior constructed from a Chinese restaurant process -- a non- parametric prior which provides control over the size of the unknown pool of population haplotypes, and on a likelihood that allows statistical errors in the haplotype/genotype relationship. Our models use the "probabilistic graphical model" formalism, a formalism that exploits the conjoined talents of graph theory and probability theory to build complex models out of simpler pieces. I discuss the mathematical underpinnings for the models, how they formally incorporate biological prior knowledge about the data, and the related computational issues.
Biosketch: Eric Xing received his B.S. with honors in Physics and Biology from Tsinghua university, his Ph.D. in Molecular Biology and Biochemistry from Rutgers University and will soon complete his Ph.D. in Computer Science at UC Berkeley. His early work in molecular biology focused on the genetic mechanisms of human carcinogenesis and the mutational spectrum of tumor suppressor genes. Then he moved into machine learning and has worked on probabilistic graphical models, approximate inference and pattern recognition. He is interested in studying biological problems (in particular, systems biology, genetic inference and evolution) using statistical learning approaches, theory and application of graphical models, nonparametric Bayesian analysis and semi-unsupervised learning.