Homework Assignment #3

Homework Assignment #3
Due Sunday, 11/13

Part 1

For this part of the homework, you are to implement a program that learns a neural network using stochastic gradient descent (on-line training). Your program should be callable from the command line. It should be named nnet and should accept three command-line arguments as follows:

nnet l h e <train-set-file>
<test-set-file>

where l specifies the learning rate, h the number of hidden units and e the number of training epochs. After training for e epochs on the training set, you should use the learned neural net to predict a classification for every instance in the test set.

If you are using a language that is not compiled to machine code (e.g. Java), then you should make a small script called nnet that accepts the command-line arguments and invokes the appropriate source-code program and interpreter, as you did in the previous homeworks.

You can assume:

Your network is intended for binary classification problems, and therefore it has one output unit with a sigmoid function. The sigmoid should be trained to predict 0 for the first class listed in the given ARFF files, and 1 for the second class.
Stochasic gradient descent is used to minimize cross-entropy error.
If h = 0, the network should have no hidden units, and the input units should be directly connected to the output unit. Otherwise, if h > 0, the network should have a single layer of h hidden units with each fully connected to the input units and the output unit.
For each numeric feature, you should use one input unit. For each discrete feature, you should use a one-of-k encoding. (Optionally, you can use a thermometer encoding for discrete numeric features. There are several in the lymph data set referenced below).
To ensure that hidden unit activations don't get saturated, you should standardize numeric features as described in this document.
A momentum term is not used during training.
Each epoch is one complete pass through the training instances. You should randomize the order of the training instances before starting training, but each epoch can go through the instances in the same order.
All weights and bias parameters are initialized to random values in [-0.01, 0.01].

Your program should read files that are in the ARFF format. In this format, each instance is described on a single line. The feature values are separated by commas, and the last value on each line is the class label of the instance. Each ARFF file starts with a header section describing the features and the class labels. Lines starting with '%' are comments. See the link above for a brief, but more detailed description of the ARFF format. Your program should handle numeric and nominal attributes, and simple ARFF files (i.e. don't worry about sparse ARFF files and instance weights). Example ARFF files are provided below.

Your program should produce as output the following:

After each training epoch, print the epoch number (starting from 1), the cross-entropy error, the number of training instances that are correctly classified, and the number of instances that are are misclassified. Print these four values on one line separated by tabs. To determine a classification, use a threshold of 0.5 on the activation of the output unit (i.e. the value computed by the sigmoid).
After training, print for each test instance the activation of the output unit, the predicted class, and the correct class. Print these values tab-separated with one line per test instance.
Finally, print the number of correctly classified and the number of incorrectly classified test instances when a threshold of 0.5 is used on the activation of the output unit.

Part 2

For this part, you will explore the effect of hidden units and the number of training epochs. Using heart_train.arff and heart_test.arff, you should make two graphs showing error-rates versus the number of training epochs. For the first graph, plot training and testing error rates for a single-layer network trained for 1, 10, 100 and 500 epochs, using a learning rate of 0.1. For the second graph, plot similar curves for a network with 20 hidden units. Be sure to label the axes of your plots.

Part 3

For this part, you should produce ROC curves for two data sets. Use the activation of the output unit as the measure of confidence that a given test instance is positive, and plot ROC curves for both the heart data set indicated above, and the lymphography data set: lymph_train.arff, lymph_test.arff. Be sure to label the axes of your plots.

Submitting Your Work

You should turn in your work electronically using the Canvas course management system. Turn in all source files and your runnable program as well as a file called hw3.pdf that shows your work for Parts 2 and 3. All files should be compressed as one zip file named <Wisc username>_hw3.zip. Upload this zip file as Homework #3 at the course Canvas site.