|
Syllabus, Readings and Lecture Notes
Course Overview
Motif and cis-Regulatory Module (CRM) Modeling
- topics: learning motif models, learning models of cis-regulatory
modules, Gibbs sampling, Dirichlet priors,
parameter tying, sequence entropy, mutual information
- required reading
- T. Bailey and C. Elkan.
The value
of prior knowledge in discovering motifs with MEME.
In Proceedings of the 3rd International Conference on
Intelligent Systems for Molecular Biology, pp. 21-29, 1995.
- C. Lawrence, S. Altschul, M. Boguski, J. Liu, A. Neuwald, and
J. Wootton. Detecting
subtle sequence signals: a Gibbs sampling strategy for multiple alignment.
Science 262:208-214, 1993.
- O. Elemento, N. Slonim and S. Tavazoie.
A universal framework for regulatory element discovery across all genomes and data types.
Molecular Cell 28(2):337-350, 2007.
(Supplemental materials containing key methodological details)
- optional reading
- optional viewing
- lecture notes
- Learning Sequence
Motif Models using EM
(PDF, PPTX) (1/25, 1/30)
- Learning Sequence Motif
Models using Gibbs Sampling (PDF, PPTX, Gamma example, Dirichlet example) (2/1, 2/6)
- Inferring Models of cis-Regulatory Modules using Information Theory (PDF, PPTX) (2/8, 2/13)
Genotype Analysis
- topics: haplotype inference, genome-wide association studies (GWAS),
quantitative trait loci (QTL) mapping, multiple hypothesis testing
- required reading
- optional reading
- lecture notes
- Linking Genetic Variation to Important Phenotypes (PDF, PPTX) (2/15)
- GWAS and multiple testing correction (PDF, PPTX) (2/20, 2/22, 2/27)
Epigenomics
- topics: epigenomic data types, DNase I hypersensitivity, Gaussian processes,
convolutional neural networks, interpreting noncoding genetic variants
- required reading
- R.I. Sherwood, T. Hashimoto, C.W. O'Donnell, S. Lewis, A.A. Barkal, J.P. van Hoff, V. Karun, T. Jaakkola, and D.K. Gifford. Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape. Nat Biotechnol 32(2): 171–178, 2014.
- J. Lever, M. Krzywinski, and N. Altman. Points of Significance: Classification evaluation. Nat Methods 13(8):603-604, 2016.
- C. Angermueller, T. Pärnamaa, L. Parts, and O. Stegle. Deep learning for computational biology. Mol Syst Biol 12(7):878, 2016.
- optional reading
- lecture notes
RNA-Seq and Mass Spectrometry
- topics: RNA-Seq technology, transcript quantification,
peptide and protein identification with mass spectrometry
- required reading
- optional reading
- Z. Wang, M. Gerstein, and M. Snyder. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1): 57-63, 2009.
- A. Conesa, P. Madrigal, S. Tarazona, D. Gomez-Cabrero, A. Cervera, A. McPherson, M.W. Szczesniak, D.J. Gaffney, L.L. Elo, X. Zhang, and A. Mortazavi. A survey of best practices for RNA-seq data analysis. Genome Biology 17(13), 2016.
- A.I. Nesvizhskii. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J Proteomics 73(11): 2092-2123, 2010.
- J.K. Eng, B. Fischer, J. Grossmann, and M.J. MacCoss. A Fast SEQUEST Cross Correlation Algorithm. J Proteome Res 7(10): 4598-4602, 2008.
- lecture notes
- Transcript quantification with RNA-Seq (PDF, PPTX) (3/22, 4/3, 4/5)
- Mass spectrometry (PDF, PPTX) (4/5, 4/10)
Biological Network Analysis
- topics: protein interactions, pathway identification, linear programming, min cost flow
- required reading
- E. Yeger-Lotem, L. Riva, L.J. Su, A.D. Gitler, A.G. Cashikar, O.D. King, P.K. Auluck, M.L. Geddie, J.S. Valastyan, D.R. Karger, S. Lindquist, and E. Fraenkel. Bridging high-throughput genetic and transcriptional data reveals cellular responses to alpha-synuclein toxicity. Nat Genet 41(3):316-323, 2009.
- optional reading
- lecture notes
- Identifying signaling pathways (PDF, PPTX) (4/10, 4/12, 4/17)
Gene Finding
- topics: gene finding, interpolated Markov models, generalized HMMs, pair HMMs
- required reading
- optional reading
- lecture notes
- Interpolated Markov Models for Gene Finding (PDF, PPTX) (4/17, 4/19)
- Eukaryotic Gene Finding (PDF, PPTX) (4/19)
Large-Scale and Whole-Genome Sequence Alignment
- topics: large-scale alignment, whole-genome alignment,
suffix trees, k-mer tries, longest increasing
subsequence problem, MUMmer
- required reading
- A. Delcher, S. Kasif, R. Fleischmann, J. Peterson, O. White
and S. Salzberg.
Alignment of Whole Genomes.
Nucleic Acids Research 27(11):2369-2376, 1999.
- optional reading
- E. Ukkonen.
On-line Construction of Suffix Trees
Algorithmica 14(3):249-260, 1995.
- M. Brudno, C. Do, G. Cooper, M. Kim, E. Davydov, NISC Comparative
Sequencing Program, E. Green, A. Sidow, and S. Batzoglou.
LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale
Multiple Alignment of Genomic DNA.
Genome Research 13:721-731, 2003.
- Chapter 3 of C. Dewey.
Whole-genome alignments and polytopes for comparative genomics.
PhD thesis. University of California, Berkeley, 2006.
- lecture notes
- Alignment of Long Sequences (PDF, PPTX) (4/24, 4/26)
RNA Structure Analysis
- topics: predicting RNA secondary structure, Nussinov/energy-minimization algorithms,
stochastic context free grammars
- required reading
- Chapter 9 in Durbin et al.
- Sections 10.1, 10.2 in Durbin et al.
- optional reading
- lecture notes
Lecture Notes
Thank you to Professors Mark Craven and Colin Dewey for providing lecture material. These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by Mark Craven, Colin Dewey, and Anthony Gitter.
|