UW Biiostatistics & Medical Informatics UW Biostatistics & Medical Informatics UW Madison UW Biostatistics & Medical Informatics Site Map
UW School of Medicine and Public Health UW Madison


 

 


Other Seminar Series

Seminars

General Departmental Seminar Series

Learning Parameters for Sequence Alignment from Examples with Missing Data

Eagu Kim
Computer Science Department,
University of Arizona

Monday, May 12th, 2008
11:00 pm
3265 MSC

ABSTRACT
For as long as biologists have been computing alignments of sequences, the question of what values to use for scoring substitutions and gaps has persisted. In practice, substitution scores are usually chosen by convention, and gap parameters are often found by trial and error. In contrast, a rigorous way to determine parameter values that are appropriate for aligning biological sequences is by solving the problem of Inverse Parametric Sequence Alignment: given examples of correct alignments, find parameter values that make the examples score as close as possible to optimal alignments of their sequences. The examples that are currently available contain regions where the alignment is not specified, which leads to a version with missing data.

In this talk, we present a new polynomial-time algorithm for Inverse Parametric Sequence Alignment that is simple to implement, fast in practice, and can learn hundreds of parameters simultaneously from hundreds of examples with missing data. Computational results show that best-possible values for all 212 parameters of the standard protein sequence alignment model can be computed from 200 examples in 4 hours of computation. Experiments on benchmark biological alignments show we can find parameters that generalize across protein families and boost the accuracy of multiple sequence alignment by as much as 25%. If time permits, we will also discuss how to use predicted secondary structure to improve the accuracy of protein sequence alignment even further.

This is joint research with John Kececioglu.

Biography
Eagu Kim is currently a Ph.D. candidate in the Computer Science Department at the University of Arizona doing a dissertation on local similarity and inverse parametric sequence alignment. His research interests include combinatorial optimization, and design and analysis of algorithms for multiple sequence alignment, and whole genome alignment.

Return to seminar list

 

Internal Use | Site Map | Search |
Overview | People | Training | Research | Seminars | Employment | Links |
Biostatistics Program | Clinical Trials Program | Medical Informatics Program | Biomedical Computing |

Copyright © 2006 The Board of Regents of the
University of Wisconsin System

 

UW Madison UW School of Medicine and Public Health