Description
1. Softmax regression (aka multinomial logistic regression) [35 points]: In this problem you will
train a softmax regressor to classify images of hand-written digits from the MNIST dataset. The input
to the machine will be a 28 × 28-pixel image (converted into a 784-dimensional vector); the output
will be a vector of 10 probabilities (one for each digit). Specifically, the machine you create should
implement a function g : R
785 → R
10, where the kth component of g(x˜) (i.e., the probability that input
x˜ belongs to class k) is given by
exp(x˜
>w˜ k)
P9
k0=0 exp(x˜>w˜ k0 )
where x˜ = [x
>, 1]>.
The weights should be trained to minimize the cross-entropy (CE) loss:
fCE(w˜ 0, . . . , w˜ 9) = −
1
n
Xn
j=1
X
9
k=0
y
(j)
k
log ˆy
(j)
k
where n is the number of training examples. Note that each ˆyk implicitly depends on all the weights
w˜ 0, . . . , w˜ 9, where each w˜ k = [w>
k
, 1]>.
To get started, first download the MNIST dataset (including both the training and testing subsets)
from the following web links:
• https://s3.amazonaws.com/jrwprojects/small_mnist_train_images.npy
• https://s3.amazonaws.com/jrwprojects/small_mnist_train_labels.npy
• https://s3.amazonaws.com/jrwprojects/small_mnist_test_images.npy
• https://s3.amazonaws.com/jrwprojects/small_mnist_test_labels.npy
These files can be loaded into numpy using np.load.
Then implement stochastic gradient descent (SGD) as described in the lecture notes. I recommend
setting ˜n = 100 for this project.
Note that, since there are 785 inputs (including the constant 1 term) and 10 outputs, there will be 10
separate weight vectors, each with 785 components. Alternatively, you can conceptualize the weights
as a 10 × 785 matrix.
After optimizing the weights on the training set, compute both (1) the loss and (2) the accuracy
(percent correctly classified images) on the test set. Include both the cross-entropy loss values
and the “percent-correct” accuracy in the screenshot that you submit.
Finally, create an image to visualize each of the trained weight vectors w0, . . . , w9 (similar to what
you did in homework 2).
2. Data augmentation [25 points]: It is often useful to enlarge your training set by synthesizing new
examples from ones you already have. The simplest way to do this is to apply label-preserving
transformations, i.e., create new “copies” of some original training examples by altering them in
subtle ways such that the label of the copy is always the same as the original. For images, this can be
achieved through operations such as rotation, scaling, translating, as well as adding random noise to
the value of each image pixel (e.g., from a Gaussian or Laplacian distribution). (For symmetric classes
(e.g., 8, 0), you could use mirroring/flipping, though this is not required for this assignment.)
You are required to implement all of the following transformations: translation, rotation, scaling,
random noise. (For rotation, feel free to use the skimage.transform.rotate method in the skimage
1
package.) From each example i of the original training set (x
(i)
, y(i)
), randomly pick one of the
augmentation methods above, and generate a new image whose label is also y
(i)
. Put all n of these
new examples into a new Python array called Xaug and its associated label into a new array yaug. We
will manually inspect your code to verify that you completed this correctly. Then, show 1 example (in
the PDF file) of an original and augmented training example for each of these transformations (i.e., 4
original and 4 augmented images in total).
Note: augmenting the training set in this assignment will likely not help much to improve generalization accuracy because of how the softmax regression classifier works (it is a generalized linear model).
However, it can make a substantial improvement for other models we will explore later in this course,
e.g., non-linear support vector machines and neural networks.
In addition to submitting your Python code in a file called homework3 WPIUSERNAME1.py
(or homework3 WPIUSERNAME1 WPIUSERNAME2.py for teams), please submit a PDF file containing a screenshot
of (1) the last 20 iterations of your gradient descent on the training data. Name the file homework3 WPIUSERNAME1.pdf
(or homework3 WPIUSERNAME1 WPIUSERNAME2.pdf for teams).
2