Some Project Ideas
- Implement a version of BLAST. Empirically evaluate the value
of some aspect of the algorithm (e.g. the two-hit method).
- Implement and empirically compare several heuristic methods for
multiple sequence alignment.
- Investigate a novel application of an optimization method to the
task of multiple sequence alignment. Empirically compare the
novel method to a standard one.
- Implement an interpolated Markov model for gene finding.
Compare it to fixed-order Markov chain models.
- Implement profile HMMs and experiment with using them
to model protein families.
- Implement and empirically compare several clustering algorithms
on a gene expression data set.
- Implement and empirically compare several classification algorithms
on a gene expression data set.
- Implement and evaluate an algorithm for inferring regulatory
networks from gene expression data.
- Implement and experiment with an algorithm for protein threading.
- Implement and evaluate an algorithm for phylogenetic
tree construction.
- Implement and experiment with using stochastic context free
grammars to model some class of RNAs.
- Extend MUMmer's LIS algorithm to multiple sequence alignments
Some Available Data Sets
- gene expression data sets from Whitehead/MIT Center for Genome Research
- gene expression data sets from Stanford Microarray Database
- Paul Horton's protein localization data sets
- Pfam database of protein families
- human gene data set
- TRANSFAC database of
transcription factor binding sites
- SCPD database of
transcription factor binding sites
- extensive list of molecular biology databases
Last modified: Fri Apr 5 11:03:19 CST 2002