CS 760 Project Overview
We recommend that your project focus on a supervised learning task and
use a fairly large data set. The data set can be one that has already
been assembled (see some possible resources below) or one that your
team assembles specifically for the project.
The project will entail doing whatever
pre-processing is necessary to set up the learning task (constructing
features, etc.) and conducting a set of experiments to answer
well-defined questions such as:
- Which learning algorithms provide the best predictive accuracy on the task?
- Do more sophisticated learning methods provide better predictive accuracy than simple ones (e.g. is a deep network more accurate than a network with a single layer of hidden units, or a network with no hidden units)?
- How does predictive accuracy vary as a function of training-set size?
- Do ensembles lead to better accuracy?
- What types of features have the most predictive value?
- Etc.
You should not feel constrained by this list of questions; these are just some examples.
Each project team should consist of 4-5 students.
Project proposals are due on Friday, April 13
The project proposal should be no more than one page. It should cover the following items:
-
Who are the members of the team?
-
What is the task the project is addressing? What is the data set to be used?
-
What experiments will you run? What will you measure in your experiments? What methods will you use as baselines for comparison?
-
What will be the division of labor within the project? Who will do what?
-
What is your tentative schedule for major milestones in the project (data collection and formatting, experiment running, result analysis, etc.)?
Some Data Set Resources
Some Software Resources
Final project reports are due on Monday, May 7
- Your report should have a title, a one-paragraph abstract, titled
sections, and bibliographic references. It should formatted using
the Association
for the Advancement of Artificial Intelligence (AAAI) author
kit, and be no longer than 8 pages including references,
figures, and tables. The AAAI author kit provides templates for both LaTeX and Word.
-
Something important to keep in mind, no matter what you are writing,
is "who is my intended audience"? It's surprising how often even
professional researchers lose track of this. For your project report,
pretend that your audience is your fellow students in the class.
That is, you should assume that your readers have about the same general
background in machine learning that you do, but that they don't know
as much about the topic of your project as you do. Therefore, you should
describe your algorithms, data sets, etc. accordingly.
-
Your paper should probably have sections along the lines of:
-
Introduction: what you attempted to do, and what the motivation is.
-
Approach: what you did. If you developed your own approach, you
should describe your work in sufficient detail that someone else could
replicate your work. If you are using previously developed
algorithms, describe them briefly, and provide references to complete
descriptions. Don't describe your code organization or implementation
details. For the intended audience, you should assume that interested
readers could figure out how to implement the code as long as the
methods are described in sufficient detail.
- Empirical Evaluation: describe your experiments and results.
Describe your data sets in adequate detail. If you selected a subset
of a larger data set, how did you make this selection? Describe how
you chose settings for parameters of the algorithms? Clearly state
what are you trying to test/demonstrate in your experiments. Your
experiments should be motivated by one or more explicitly stated
hypotheses or questions.
-
Discussion: discuss your results. What are the lessons of your
experiments? What are the limitations of your approach? What would
you suggest for future work in this direction?
You don't have to strictly adhere to this format if you think it is
not the best organization for your particular project.
-
Every paper should have some figures or tables. All figures and
tables should have informative captions. If you include graphs, make
sure that the axes are labeled. Figures and tables should be
referenced and described in the text, not just dropped into the
document.
- Don't worry if
your experiments don't turn out as you predicted. That's how science
often goes. Data have a way of frequently humiliating hypotheses.
The important thing is how well you carried out the process. That is,
you will be graded on such things as (i) clearly defining your
objectives/ hypotheses, (ii) selecting appropriate experiments, (iii)
clearly reporting relevant results, and (iv) carefully discussing the
significance/lessons of your results.