BMI/CS 776 Course Project
Project Milestones
- March 16: project proposal due
- April 20: progress report due
- May 11: final report due
The Project Proposal
Your proposal should discuss three aspects of your proposed project.
- What is the goal of the project? What questions are you
investigating? What are the hypotheses?
- What algorithms/methods will you implement?
- What experiments will you run? What data sets will you use?
What will you vary/compare in your experiments? What will you measure?
Incomplete List of Project Suggestions
- Create a model of sequence evolution that explicitly models
insertions and deletions due to replication slippage. Develop an
alignment algorithm based on this model.
- Design an ab initio motif-finding method (i.e., it
discovers new classes of motifs) that takes into account dependencies
between non-adjacent positions.
- Design, implement and evaluate an algorithm for identifying
cis-regulatory modules (arrangements of binding site motifs that
regulate a set of genes under certain conditions).
- Compare the time to convergence and the resulting accuracy when
EM and Gibbs sampling are used in a model with hidden state. The
model could be a MEME-style motif model, another type of hidden Markov
model, a stochastic context free grammar, a mixture model, etc.
- Implement a method for gene finding that employs multiple
genomes. Investigate how the accuracy of the predictions are affected
by how closely related the informant genome is (e.g. you might use,
say, mouse, zebrafish, and fruit fly as the informant genomes).
- Convert a gene finding algorithm based on a generative model into
one based on a similar discriminative model, or vice versa. Compare
the performance of the two models.
- Attempt to improve upon the current state-of-the-art multiple
alignment program (AMAP) by adopting a non-greedy search or estimating
parameters between sequences before alignment.
- Randomize a traditional filter for finding highly similar local
similarities in sequences. Use an algorithm based on this
randomization to efficiently find all high-scoring local alignments in
a set of sequences.
- Implement and experiment with an SCFG-based approach for
identifying RNA genes via cross-genome comparisons.
- Extend the method of Bockhorst
and Craven for refining the structure of a context free grammar.
Devise a new operator and an appropriate heuristic for applying it.
Evaluate the method using a terminator data set.
- Design an algorithm for the alignment of protein networks.
Some biomedical text mining ideas (not my expertise, but Mark Craven
may be able to help):
- Design, implement and evaluate a method that clusters genes using
multiple sources of evidence, such as gene-expression data
and text associated with the genes.
- Design, implement and evaluate an algorithm for discovering
themes in a set of scientific articles.
- Devise a grammar for some type of biological named entity
(e.g. gene/protein names). Implement and evaluate a model, based on
the grammar, for recognizing entities of this type in scientific
articles.
- Design, implement and evaluate a method for classifying scientific
articles according to their relevance to specific categories
(e.g. whether they discuss tumor biology or not).