Mixture modeling for genome-wide localization of transcription factors
Sunduz Keles, PhD
Department of Statistics
Department of Biostatistics and Medical Informatics
Friday, February 3, 2006, 12:00pm
Chromatin immunoprecipitation followed by DNA microarray analysis (ChIP-chip methodology) is an efficient way of mapping genome-wide protein-DNA interactions. Data from tiling arrays encompass DNA-protein interaction measurements on thousands or millions of short oligonucleotides (probes) tiling a whole chromosome or a genome. We propose a new model-based method for analyzing ChIP-chip data. The proposed model is motivated by the widely used two-component multinomial mixture model of de novo motif
finding. It utilizes a hierarchical gamma mixture model of binding intensities while incorporating inherent spatial structure of the data. In this model, genomic regions belong to either one of the following two general groups: regions with a local protein-DNA interaction (peak) and regions lacking this interaction.
Individual probes within a genomic region are allowed to have
different localization rates accommodating different binding
affinities. A novel feature of this model is the incorporation of
a distribution for the peak size derived from the experimental
design and parameters. This leads to the relaxation of the fixed
peak size assumption that is commonly employed when computing a test statistic for these types of spatial data. I will present several novel applications from ChIP-chip data of Drosophila Melanogaster obtained using Affymetrix tiling arrays.