Description
Goal
The goal of this assignment is to build a back propagation net which will process the handprinted digits that we worked with for assignment 2. You should be able to re-use the code you wrote to read in the digits and create a data structure
Data Set
The data set consists of handprinted digits, originally provided by Yann Le Cun. Each digit is described by a 14×14 pixel array. Each pixel has a grey level with value ranging from 0 to 1. The data is split between two files, a training set that contains the examples used for training your neural network, and a test set that contains examples you’ll use to evaluate the trained network. Both training and test sets are organized the same way. Each file begins with 250 examples of the digit “0”, followed by 250 examples of the digit “1”, and so forth up to the digit “9”. There are thus 2500 examples in the training set and another 2500 examples in the test set.
Each digit begins with a label on a line by itself, e.g., “train4-17”, where the “4” indicates the target digit, and the “17” indicates the example number. The next 14 lines contain 14 real values specifying pixel intensities for pixels in the corresponding row. Finally, there is a line with 10 integer values indicating the target. The vector “1 0 0 0 0 0 0 0 0 0” indicates the target 0; the vector “0 0 0 0 0 0 0 0 0 1” indicates the target 9.
Part 1 (suggestion: do this in week 1)
Implement a network with 196 inputs (for the 14×14 digit pattern) and 10 logistic output neurons (for the digit classes 0-9 and direct connections from the inputs to the outputs. Train with a squared error cost function. This is equivalent to performing 10 logistic regressions in parallel (using a squared error cost function for each).
(a) Report how you set learning rates and decided when to stop training the network.
(b) Make a plot of error as a function of epoch during training.
(c) Compute the accuracy on the test set. To determine if a response is correct on the test set, see if that neuron is the most active.
(d) [OPTIONAL] In addition to plotting the training error as training proceeds, superimpose a plot of error on the test set. (I.e., run the test set through the network with weights frozen at the end of each epoch of training). How does training error compare to test error? If the training and test sets have similar statistics and the network isn’t overfitting the training set, then the errors should match pretty well.
Part 2
Implement a network with 196 inputs and 10 normalized exponential output neurons. Train this with a cross entropy error measure. Both the normalized exponential output and the cross entropy error are explained in this Hinton video.
(a) Report how you set learning rates and decided when to stop training the network.
(b) Make a plot of error as a function of epoch during training.
(c) Compute the accuracy on the test set. To determine if a response is correct on the test set, see if that neuron is the most active.
(d) Do you see much difference between the networks in parts 1 and 2?
Part 3
Implement three layered back propagation on the digits data set. Make the architecture strictly layered: input-to-hidden, hidden-to-output. You can choose an activation function for the hidden neurons (logistic, or symmetric sigmoid) and for the output neurons (logistic, symmetric sigmoid, or normalized exponential) and choose an error function (squared error or cross entropy). Note that these alternatives can be mixed and matched as you will, leading to 12 different possibilities. If you want a default suggestion, I’d say: use symmetric sigmoid for the hidden, normalized exponential for the output, and cross entropy for the error function.
(a) Train the net with 2, 5, 10, or 15 hidden units.
(b) Plot training error as a function of epoch.
(c) After training is complete, plot classification accuracy on the training set (i.e., number correctly classified, not training error) for the 4 networks
(d) After training is complete, plot classification accuracy on the testing set
(a) Choan output neuroTrain a perceptron to discriminate 8 from 0. You will have 500 training examples and 500 test examples.
Part 4 (Optional)
Train a perceptron with 10 outputs to classify the digits 0-9 into distinct classes. Using the test set, construct a confusion matrix. The 10×10 confusion matrix will specify the frequency by which each input digit is assigned to each output class. The diagonal of this matrix will indicate correct classifications.