Building an accurate DNA-binding model for a transcription factor (TF) is essential to differentiate its true binding targets from those spurious ones. It is an important step towards the understanding of gene regulation. In this talk, I will present a boosting approach for constructing TF-DNA binding classifiers. Different from the widely used weight matrix model, which predicts TF-DNA binding based on a linear combination of position-specific contributions, our approach builds a
binding classifier by combining a set of weight-matrix-based classifiers, thus yielding a non-linear binding decision boundary. In addition, our training procedure makes better use of negative information that is not well utilized by traditional methods for building weight matrices. The proposed approach was applied to the ChIP-chip data of Saccharomyces cerevisiae. When compared to the weight matrix method, the boosting approach showed significant improvements on the specificity in a majority of cases. I will also discuss how the boosting approach can help in our current research on systems biology. At the end of this talk, I will demonstrate GeneNotes -- a novel information management application for biologists to collect and manage multimedia biological information. |