Project Information

CS 760 Project Overview

We recommend that your project focus on a supervised learning task and use a fairly large data set. The data set can be one that has already been assembled (see some possible resources below) or one that your team assembles specifically for the project.

The project will entail doing whatever pre-processing is necessary to set up the learning task (constructing features, etc.) and conducting a set of experiments to answer well-defined questions such as:

Which learning algorithms provide the best predictive accuracy on the task?
Do more sophisticated learning methods provide better predictive accuracy than simple ones (e.g. is a deep network more accurate than a network with a single layer of hidden units, or a network with no hidden units)?
How does predictive accuracy vary as a function of training-set size?
Do ensembles lead to better accuracy?
What types of features have the most predictive value?
Etc.

You should not feel constrained by this list of questions; these are just some examples.

Each project team should consist of 4-5 students.

Project proposals are due on Friday, April 13

The project proposal should be no more than one page. It should cover the following items:

Who are the members of the team?
What is the task the project is addressing? What is the data set to be used?
What experiments will you run? What will you measure in your experiments? What methods will you use as baselines for comparison?
What will be the division of labor within the project? Who will do what?
What is your tentative schedule for major milestones in the project (data collection and formatting, experiment running, result analysis, etc.)?

Some Data Set Resources

Some Software Resources

Final project reports are due on Monday, May 7

Your report should have a title, a one-paragraph abstract, titled sections, and bibliographic references. It should formatted using the Association for the Advancement of Artificial Intelligence (AAAI) author kit, and be no longer than 8 pages including references, figures, and tables. The AAAI author kit provides templates for both LaTeX and Word.
Something important to keep in mind, no matter what you are writing, is "who is my intended audience"? It's surprising how often even professional researchers lose track of this. For your project report, pretend that your audience is your fellow students in the class. That is, you should assume that your readers have about the same general background in machine learning that you do, but that they don't know as much about the topic of your project as you do. Therefore, you should describe your algorithms, data sets, etc. accordingly.
Your paper should probably have sections along the lines of:
- Introduction: what you attempted to do, and what the motivation is.
- Approach: what you did. If you developed your own approach, you should describe your work in sufficient detail that someone else could replicate your work. If you are using previously developed algorithms, describe them briefly, and provide references to complete descriptions. Don't describe your code organization or implementation details. For the intended audience, you should assume that interested readers could figure out how to implement the code as long as the methods are described in sufficient detail.
- Empirical Evaluation: describe your experiments and results. Describe your data sets in adequate detail. If you selected a subset of a larger data set, how did you make this selection? Describe how you chose settings for parameters of the algorithms? Clearly state what are you trying to test/demonstrate in your experiments. Your experiments should be motivated by one or more explicitly stated hypotheses or questions.
- Discussion: discuss your results. What are the lessons of your experiments? What are the limitations of your approach? What would you suggest for future work in this direction?
You don't have to strictly adhere to this format if you think it is not the best organization for your particular project.
Every paper should have some figures or tables. All figures and tables should have informative captions. If you include graphs, make sure that the axes are labeled. Figures and tables should be referenced and described in the text, not just dropped into the document.
Don't worry if your experiments don't turn out as you predicted. That's how science often goes. Data have a way of frequently humiliating hypotheses. The important thing is how well you carried out the process. That is, you will be graded on such things as (i) clearly defining your objectives/ hypotheses, (ii) selecting appropriate experiments, (iii) clearly reporting relevant results, and (iv) carefully discussing the significance/lessons of your results.