Motif and cis-Regulatory Module (CRM) Modeling
concepts to know
the motif learning task
position weight (profile) matrices
EM algorithms
Gibbs sampling
the OOPs and ZOOPS models of MEME
tying parameters
the CRM learning task
the structure search problem
beam search
sensitivity/recall, specificity, precision
duration modeling
semi-Markov models (generalized HMMs)
dynamic programming with semi-Markov models
entropy and mutual information
using MI to identify interesting motifs (the FIRE approach)
be able to do
MEME algorithm
Gibbs sampling for motif finding
calculate sensitivity/recall, specificity, precision
design HMMs with specified duration models
Gene Finding
concepts to know
the gene finding task
interpolated Markov models
the MDD representation
TWINSCAN paired sequence representation
pair HMMs
the relationship between pair HMMs and sequence alignment
generalized pair HMMs
be able to do
interpolated Markov models
RNA-Seq
concepts to know
how RNA-Seq data are generated
the transcript quantification task given RNA-Seq data
how the EM algorithm applies to the transcript quantification task
alternative splicing
probabilistic splice graphs
identifiability of statistical models
RNA Analysis
concepts to know
RNA secondary structure
the secondary structure prediction task
using dynamic programming to predict RNA secondary structure
how Nussinov can be generalized to do energy minimization
pseudoknots and why they are a problem
transformational grammars
probabilistic grammars
the Chomsky hierarchy
ambiguity in a grammar
why CFGs are appropriate for RNA modeling
what the Inside, CYK and Inside/Outside algorithms do
space and time complexity of Inside, CYK, Inside/Outside
RIBOSUM matrices
using a SCFG to find a sequence matching a given structure
be able to do
the Nussinov algorithm
show parse trees for a sequence with a given grammar
Inside algorithm
Inside/Outside algorithm
Large-Scale and Whole-Genome Sequence Alignment
concepts to know
the large-scale alignment task
the general strategy of large-scale aligners
suffix trees
tries
threaded tries
maximal unique matches (MUMs)
multi-MUMs
longest increasing subsequence problem
constrained dynamic programming
genome rearrangements
progressive alignment
recursive anchoring
overview of MUMMER/LAGAN/MLAGAN/Mauve algorithms
breakpoint graph in Mercator
using undirected graphical models in Mercator to identify breakpoints
be able to do
show suffix trees for a given (set of) string(s)
show trie/threaded trie for strings
calculate MUMs and mult-MUMs
Comparative Network Analysis
concepts to know
classes of biological networks
statistical properties of biological networks
techniques for randomizing graphs
the conserved module detection task
Sharan et al. probabilistic model for conserved module detection
Sharan et al. algorithm for conserved module detection
Protein Structure Prediction
concepts to know
the protein structure prediction problem
levels of protein structure (primary, secondary, etc.)
the homology modeling approach
the threading approach
how predictions are made
how threadings are calculated
branch-and-bound search
be able to do
branch-and-bound threading
Genotype Analysis
concepts to know
SNPs and CNVs
genome-wide association study
QTL mapping
single-SNP statistical tests for association
multiple testing correction
bonferroni correction
false discovery rate
benjamini-hochberg FDR procedure
be able to do
bonferroni correction on a set of p-values
benjamini-hochberg FDR calculation for a set of p-values