## Description

2 Overview

Figure 1: You will implement (1) a multi-layer perceptron (neural network) and (2)

convolutiona neural network to recognize hand-written digit using the MNIST dataset.

The goal of this assignment is to implement neural network to recognize hand-written

digits in the MNIST data.

MNIST Data You will use the MNIST hand written digit dataset to perform the first

task (neural network). We reduce the image size (28 × 28 → 14 × 14) and subsample

the data. You can download the training and testing data from here:

http://www.cs.umn.edu/~hspark/csci5561/ReducedMNIST.zip

Description: The zip file includes two MAT files (mnist_train.mat and mnist_test.mat).

Each file includes im_* and label_* variables:

• im_* is a matrix (196 × n) storing vectorized image data (196 = 14 × 14)

• label_* is n × 1 vector storing the label for each image data.

n is the number of images. You can visualize the i

th image, e.g.,

imshow(uint8(reshape(im_train(:,i), [14,14]))).

3

CSCI 5561: Assignment #4

Convolutional Neural Network

3 Single-layer Linear Perceptron

x1 w

1 y 1 a

196 x

1

10 a 10 y

(a) Single linear perceptron

0 2000 4000 6000 8000

Iterations

6

7

8

9

10

11

12

13

Loss

Training loss

Testing loss

(b) Loss

0 1 2 3 4 5 6 7 8 9

Accuracy: 0.297905

0

1

2

3

4

5

6

7

8

9

(c) Confusion

Figure 2: You will implement a single linear perceptron that produces accuracy near

30%. Random chance is 10% on testing data.

You will implement a single-layer linear perceptron (Figure 2(a)) with stochastic gradient descent method. We provide main_slp_linear where you will implement GetMiniBatch

and TrainSLP_linear.

function [mini_batch_x, mini_batch_y] = GetMiniBatch(im_train,

label_train, batch_size)

Input: im_train and label_train are a set of images and labels, and batch_size is

the size of the mini-batch for stochastic gradient descent.

Output: mini_batch_x and mini_batch_y are cells that contain a set of batches (images and labels, respectively). Each batch of images is a matrix with size 194×batch_size,

and each batch of labels is a matrix with size 10×batch_size (one-hot encoding). Note

that the number of images in the last batch may be smaller than batch_size.

Description: You may randomly permute the the order of images when building the

batch, and whole sets of mini_batch_* must span all training data.

function y = FC(x, w, b)

Input: x∈ Rm is the input to the fully connected layer, and w∈ Rn×m and b∈ Rn are

the weights and bias.

Output: y∈ Rn

is the output of the linear transform (fully connected layer).

Description: FC is a linear transform of x, i.e., y = wx + b.

function [dLdx dLdw dLdb] = FC_backward(dLdy, x, w, b, y)

Input: dLdy ∈ R1×n

is the loss derivative with respect to the output y.

4

CSCI 5561: Assignment #4

Convolutional Neural Network

Output: dLdx ∈ R1×m is the loss derivative with respect the input x, dLdw ∈ R1×(n×m)

is the loss derivative with respect to the weights, and dLdb ∈ R1×n

is the loss derivative

with respec to the bias.

Description: The partial derivatives w.r.t. input, weights, and bias will be computed.

dLdx will be back-propagated, and dLdw and dLdb will be used to update the weights

and bias.

function [L, dLdy] = Loss_euclidean(y_tilde, y)

Input: y_tilde ∈ Rm is the prediction, and y∈ 0, 1

m is the ground truth label.

Output: L∈ R is the loss, and dLdy is the loss derivative with respect to the prediction.

Description: Loss_euclidean measure Euclidean distance L = ky − yek

2

.

function [w, b] = TrainSLP_linear(mini_batch_x, mini_batch_y)

Input: mini_batch_x and mini_batch_y are cells where each cell is a batch of images

and labels.

Output: w ∈ R10×196 and b ∈ R10×1 are the trained weights and bias of a single-layer

perceptron.

Description: You will use FC, FC_backward, and Loss_euclidean to train a singlelayer perceptron using a stochastic gradient descent method where a pseudo-code can

be found below. Through training, you are expected to see reduction of loss as shown

in Figure 2(b). As a result of training, the network should produce more than 25% of

accuracy on the testing data (Figure 2(c)).

Algorithm 1 Stochastic Gradient Descent based Training

1: Set the learning rate γ

2: Set the decay rate λ ∈ (0, 1]

3: Initialize the weights with a Gaussian noise w ∈ N (0, 1)

4: k = 1

5: for iIter = 1 : nIters do

6: At every 1000th iteration, γ ← λγ

7: ∂L

∂w ← 0 and ∂L

∂b ← 0

8: for Each image xi

in k

th mini-batch do

9: Label prediction of xi

10: Loss computation l

11: Gradient back-propagation of xi

,

∂l

∂w

using back-propagation.

12: ∂L

∂w =

∂L

∂w +

∂l

∂w

and ∂L

∂b =

∂L

∂b +

∂l

∂b

13: end for

14: k++ (Set k = 1 if k is greater than the number of mini-batches.)

15: Update the weights, w ← w −

γ

R

∂L

∂w

, and bias b ← b −

γ

R

∂L

∂b

16: end for

5

CSCI 5561: Assignment #4

Convolutional Neural Network

4 Single-layer Perceptron

x1 w

1 y

196 x

1

10 y

1 a

10 a

1

f

S

o

ft

–

m

a

x

10 f

(a) Single-layer perceptron

0 1000 2000 3000 4000 5000

Iterations

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Loss

Training loss

Testing loss

(b) Loss

0 1 2 3 4 5 6 7 8 9

Accuracy: 0.898720

0

1

2

3

4

5

6

7

8

9

(c) Confusion

Figure 3: You will implement a single perceptron that produces accuracy near 90% on

testing data.

You will implement a single-layer perceptron with soft-max cross-entropy using stochastic gradient descent method. We provide main_slp where you will implement TrainSLP.

Unlike the single-layer linear perceptron, it has a soft-max layer that approximates a

max function by clamping the output to [0, 1] range as shown in Figure 3(a).

function [L, dLdy] = Loss_cross_entropy_softmax(x, y)

Input: x ∈ Rm is the input to the soft-max, and y∈ 0, 1

m is the ground truth label.

Output: L∈ R is the loss, and dLdy is the loss derivative with respect to x.

Description: Loss_cross_entropy_softmax measure cross-entropy between two distributions L =

Pm

i yi

log yei where yei

is the soft-max output that approximates the max

operation by clamping x to [0, 1] range:

yei =

e

xi

P

i

e

xi

,

where xi

is the i

th element of x.

function [w, b] = TrainSLP(mini_batch_x, mini_batch_y)

Output: w ∈ R10×196 and b ∈ R10×1 are the trained weights and bias of a single-layer

perceptron.

Description: You will use the following functions to train a single-layer perceptron using a stochastic gradient descent method: FC, FC_backward, Loss_cross_entropy_softmax

Through training, you are expected to see reduction of loss as shown in Figure 3(b).

As a result of training, the network should produce more than 85% of accuracy on the

testing data (Figure 3(c)).

6

CSCI 5561: Assignment #4

Convolutional Neural Network

5 Multi-layer Perceptron

w1 1 x

196 x

1

1 y 1 a

10 y 10 a

1 a 1

f

m

a

mf

w2

1

f

S

o

ft

–

m

a

x

10 f

(a) Multi-layer perceptron

0 1 2 3 4 5 6 7 8 9

Accuracy: 0.914553

0

1

2

3

4

5

6

7

8

9

(b) Confusion

Figure 4: You will implement a multi-layer perceptron that produces accuracy more

than 90% on testing data.

You will implement a multi-layer perceptron with a single hidden layer using a stochastic

gradient descent method. We provide main_mlp. The hidden layer is composed of 30

units as shown in Figure 4(a).

function [y] = ReLu(x)

Input: x is a general tensor, matrix, and vector.

Output: y is the output of the Rectified Linear Unit (ReLu) with the same input size.

Description: ReLu is an activation unit (yi = max(0, xi)). In some case, it is possible

to use a Leaky ReLu (yi = max(xi

, xi) where = 0.01).

function [dLdx] = ReLu_backward(dLdy, x, y)

Input: dLdy ∈ R1×z

is the loss derivative with respect to the output y ∈ Rz where z

is the size of input (it can be tensor, matrix, and vector).

Output: dLdx ∈ R1×z

is the loss derivative with respect to the input x.

function [w1, b1, w2, b2] = TrainMLP(mini_batch_x, mini_batch_y)

Output: w1 ∈ R30×196

, b1 ∈ R30×1

, w2 ∈ R10×30

, b2 ∈ R10×1 are the trained weights

and biases of a multi-layer perceptron.

Description: You will use the following functions to train a multi-layer perceptron

using a stochastic gradient descent method: FC, FC_backward, ReLu, ReLu_backward,

Loss_cross_entropy_softmax. As a result of training, the network should produce

more than 90% of accuracy on the testing data (Figure 4(b)).

7

CSCI 5561: Assignment #4

Convolutional Neural Network

6 Convolutional Neural Network

Input Conv (3) ReLu Pool (2×2) Flatten FC Soft-max

(a) CNN

0 1 2 3 4 5 6 7 8 9

Accuracy: 0.947251

0

1

2

3

4

5

6

7

8

9

(b) Confusion

Figure 5: You will implement a convolutional neural network that produces accuracy

more than 92% on testing data.

You will implement a convolutional neural network (CNN) using a stochastic gradient

descent method. We provide main_cnn. As shown in Figure 4(a), the network is

composed of: a single channel input (14×14×1) → Conv layer (3×3 convolution with

3 channel output and stride 1) → ReLu layer → Max-pooling layer (2 × 2 with stride

2) → Flattening layer (147 units) → FC layer (10 units) → Soft-max.

function [y] = Conv(x, w_conv, b_conv)

Input: x ∈ RH×W×C1

is an input to the convolutional operation, w_conv ∈ RH×W×C1×C2

and b_conv ∈ RC2 are weights and bias of the convolutional operation.

Output: y ∈ RH×W×C2

is the output of the convolutional operation. Note that to get

the same size with the input, you may pad zero at the boundary of the input image.

Description: This convolutional operation can be simplified using MATLAB built-in

function im2col.

function [dLdw, dLdb] = Conv_backward(dLdy, x, w_conv, b_conv, y)

Input: dLdy is the loss derivative with respec to y.

Output: dLdw and dLdb are the loss derivatives with respect to convolutional weights

and bias w and b, respectively.

Description: This convolutional operation can be simplified using MATLAB built-in

function im2col. Note that for the single convolutional layer, ∂L

∂x

is not needed.

function [y] = Pool2x2(x)

Input: x ∈ RH×W×C is a general tensor and matrix.

Output: y ∈ R

H

2 × W

2 ×C is the output of the 2 × 2 max-pooling operation with stride 2.

8

CSCI 5561: Assignment #4

Convolutional Neural Network

function [dLdx] = Pool2x2_backward(dLdy, x, y)

Input: dLdy is the loss derivative with respect to the output y.

Output: dLdx is the loss derivative with respect to the input x.

function [y] = Flattening(x)

Input: x ∈ RH×W×C is a tensor.

Output: y ∈ RHW C is the vectorized tensor (column major).

function [dLdx] = Flattening_backward(dLdy, x, y)

Input: dLdy is the loss derivative with respect to the output y.

Output: dLdx is the loss derivative with respect to the input x.

function [w_conv, b_conv, w_fc, b_fc] = TrainCNN(mini_batch_x, mini_batch_y)

Output: w_conv ∈ R3×3×1×3

, b_conv ∈ R3

, w_fc ∈ R10×147

, b_fc ∈ R147 are the

trained weights and biases of the CNN.

Description: You will use the following functions to train a convolutional neural

network using a stochastic gradient descent method: Conv, Conv_backward, Pool2x2,

Pool2x2_backward, Flattening, Flattening_backward, FC, FC_backward, ReLu, ReLu_backward,

Loss_cross_entropy_softmax. As a result of training, the network should produce

more than 92% of accuracy on the testing data (Figure 5(b)).

9