CS 838 - Machine Learning for Text Analysis (Fall 2000)

General Course Information

Course Overview and Requirements

This will be a special topics course that focusing on the application of machine learning to problems in text analysis. After a few introductory lectures by the instructors, the course will consist primarily of student presentations of recent publications. The last few meetings may involve presentations of class projects.

There are four parts to this course's requirements:

  1. Students will be expected to present 2-3 papers.

  2. Each paper will be also assigned two `commentators.' Commentators will write a one-page critique of the paper and will lead the discussion on the paper.

  3. Students will be expected to read each week's papers and participate in the discussions.

  4. Students will do substantial class projects of their choice. Project proposals will be due early in the semester, and a progress report will be due mid-semester. Team projects are acceptable.

Topics and Possible Readings

Background Reading

Document Classification

Document Clustering

Text Segmentation

Information Extraction and Wrapper Induction

Keyphrase Extraction and Summarization

Information Finding

Methods for Exploiting Document Relationships

Methods for Reducing the Need for Labeled Data

LSI, POS Tagging, and Other General Techniques



date paper presenter commentator
9/13 Training Algorithms for Linear Text Classifiers Soumya Ray Andy Pohl
9/20 An Evaluation of Statistical Approaches to Text Categorization Tina Eliassi-Rad Milt Luoma
9/27 PageRank: Bringing Order to the Web
The Anatomy of a Large-Scale Hypertextual Web Search Engine
Jonathan Broad --
10/4 Authoritative sources in a hyperlinked environment Gang Luo --
10/4 Statistical Models for Text Segmentation Milt Luoma --
10/11 Learning Information Extraction Rules for Semi-structured and Free Text Yu-Shan Fung Gang Luo
10/18 Combining Labeled and Unlabeled Data with Co-Training Joe Bockhorst Gang Luo
10/25 Domain-Specific Keyphrase Extraction Maleeha Qazi --
11/1 Learning Hidden Markov Model Structure for Information Extraction Patrick Gaffney Maleeha Qazi
11/1 Introduction To Latent Semantic Analysis Mina Johnson-Glenberg --
11/8 Applying the Multiple Cause Mixture Model to Text Categorization Andy Pohl --
11/15 Wrapper induction: Efficiency and expressiveness Gang Luo --
11/15 Web Document Clustering: A Feasibility Demonstration Matt Zeidenberg --
11/22 NO CLASS -- --
11/29 Statistics-Based Summarization --- Step One: Sentence Compression Milt Luoma Maleeha Qazi
12/6 Relational Learning of Pattern-Match Rules for Information Extraction Maleeha Qazi Milt Luoma
12/6 An Instructable, Adaptive Interface for Discovering and Monitoring Information on the World-Wide Web
Learning Users' Interests by Unobtrusively Observing Their Normal Behavior
Jude Shavlik --
12/13 brief project presentations
Matt Zeidenberg's research
Milt Luoma
Gang Luo
Maleeha Qazi
Matt Zeidenberg


Last modified: Thu Sep 5 14:47:44 CDT 2002