Bayesian Gaussian Mixture
Models for High Density Genotyping Arrays
Human Genetics and Statistics, UCLA
Wednesday, April 13
Affymetrix's SNP (single nucleotide polymorphism) genotyping
chips have increased the scope and decreased the cost of gene mapping studies. Because each SNP is queried by multiple DNA probes, the chips present interesting challenges in genotype calling. Traditional clustering methods distinguish the three genotypes of a SNP fairly well given a large enough sample of unrelated individuals or a training sample of known genotypes. We attempt to improve genotype calling by constructing Gaussian mixture models with empirically derived priors. The priors stabilize parameter estimation and borrow information collectively gathered on tens of thousands of SNPs. When data from related family members are available, our models capture the correlations in signals between relatives. With these advantages in mind, we apply the models to Affymetrix probe intensity data on 10,000 SNPs gathered on 63 genotyped individuals spread over eight pedigrees. We integrate the genotype calling model with pedigree analysis and examine a sequence of symmetry hypotheses involving the correlated probe signals. The symmetry hypotheses raise novel mathematical issues of parameterization. Using the BIC criterion, we select the best combination of symmetry assumptions. Compared to Affymetrix's software, our model leads to a reduction in no calls with little sacrifice in overall calling accuracy.
This is joint work with Kenneth Lange.
Return to seminar list