## Description

Question #1

Given a data set that represents a function that takes in

14 binary digits and output one of four numbers, 0,1,2,3.

Construct a neural network to train for this function.

Neural

network

one

hot

vector

compare

with labels

Question #2

!: −1, 1 &’ → {0, 1, 2, 3}

! .&, … , .&’ ∈ {0, 1, 2, 3}

The label vectors are one-hot vectors such that

For this assignment, construct three networks given

in the next slide. Implement the back propagation

algorithm and use gradient descend to train the

networks. Use cross entropy cost function on a

softmax function for training.

Question #2

14 nodes input

Fully connected

100 nodes hidden layer

Fully connected

40 nodes hidden layer

Fully connected

4 nodes output

14 nodes input

Fully connected

28 nodes hidden layer

Fully connected

4 nodes output

28 nodes hidden layer

6 identical fully

connected hidden layers

with 28 nodes each

14 nodes input

Fully connected

14 nodes hidden layer

Fully connected

4 nodes output

14 nodes hidden layer

28 identical fully

connected hidden layers

with 14 nodes each

14-100-40-4 net 14-28×6-4 net 14-14×28-4 net

Question #2

All but the last Fully connected layers have ReLU as the activation function

(1) Plot the training cost w.r.t. iterations

(2) Plot the testing cost w.r.t. iterations

(3) Plot the train & test accuracy scores w.r.t. iterations

(4) Check your back propagation intermediate results against known

answers (see page 8 to 13 for details):

a. You will be given one special data point d.

b. You will be given one weight & bias set W0 together with correct

gradients computed using data point d and W0.

c. You will be given another weight & bias set W1 with no

corresponding gradients given.

d. Use (a) and (b) to compute gradients and compare to the given

correct gradients

e. Use (a) and (c) to compute gradients and submit your gradient

values for grading.

Your submissions will be automatically graded using

a script. Be sure to format your output according to

instructions in the next slide. Incorrect format will be

graded as incorrect answers

Question #2

Note: Output

here means the

loss function

Question #2

Give a half page discussion on why the three

networks 14-100-40-4 net,14-28×6-4 net,14-14×28-4

net perform differently.

Which one performs better and why.

For Question#2(1)(2)(3)

• The data given to you is under path Question2_123.

• It contains 4 csv files, (i.e. x_train, x_test, y_train, y_test). In the x_* files, each line is

a datapoint of 14 dimensions, where in the y_* files, each line is the respective

labels which are corresponding to the ones in x_*.

For Question#2(4)

• The data given to you is under path Question2_4

• You will be given weights and gradients for verification, which is under the ‘b’ folder,

where the weights you will work on is in the ‘c’ folder. To ease your understanding, in

the weights csv, there’s one heading column introducing the weights/biases, while

there is NO such column for the gradients. And you should NOT include the

headings in your submission as well.

• The given data point x and label y is in the ‘a.txt’ file. For both verification and test,

we use the same data point and label.

• Your submission should be the same format as the given ‘true-d*’ files.

• If possible, use np.float32 to control the granularity of your gradients, otherwise,

round your results to at least 16 digits, or e-16.

Given files

• Your submission is a single .zip file. Other compressions are NOT acceptable.

• The naming of the .zip file is your ID as shown on the IVLE/class. We will use it for

grading. For most cases, it’s your NUS NETID, which will be e******.zip (‘e’ in

lowercase).

• Inside the .zip file, there are 8 files. They are 1 pdf, 6 csv and 1 folder containing

your code.

• For the essay part of question1 and question 2. You will submit a single pdf file.

Please section your document properly.

• For question2(4), the output files are csv files, with comma (i.e.’,’) as the delimiter.

Using space or other delimiters are NOT acceptable.

• Csv namings are as follow: dw#100#40#4.csv,, db#100#40#4.csv,, dw#28#6#

4.csv,,db#28#6#4.csv,, dw#14#28#4.csv,, db#14#28#4.csv,which

correspond to the gradients of weights and biases for the three network

configurations: 14-100-40-4, 14-28*6-4, 14-14*28-4. Other naming are NOT

acceptable. (The naming has changed to be compatible with win & *nix)

• Since this task you are not expected to work with platforms, so aside from the 6

csv file, please upload your code. You should pack your code (only codes) in a

folder and compress along with the other files

Submission

0

1

0

1

2

0

1

2

3

4

0

1

2

3

Input layer0 layer1 layer2 output layer3

• Suppose layer(t) has 12 number of nodes, so the weights

from layer(t) to layer(t+1) form a 12 by 123&matrix, where the

(i, j)-th entry of this matrix represents the weight connecting

the i-th node of layer(t) to the j-th node of layer(t+1).

• The bias then is simply a length-12 vector as for layer(t).

Noted that the input layer 0 has no corresponding bias.

• Softmax is not considered to be a layer in this context, so the

output layer output logits.

More Details for Question#2(4)

The given weights and corresponding gradients output file is of the

following format:

• Totally ∑ 12

# 789:;<=>?&

2 rows.

• The first 1@ rows are the matrix from input layer0 to hidden

layer1, the following 1& rows are the matrix from layer1 to layer2,

and so on so forth. There’s NO blank line between matrix(t) and

matrix(t+1)

• Then of each matrix, the (i, j) item is the derivative, which is d

(output)/ d w(i,j) of the corresponding node.

The given biases and corresponding gradients output file is similar:

• Totally ∑ 1 # 789:;<=>?&

2 rows.

• The first row is the bias vector for layer1, and so has 1& items.

The second line is the bias vector for layer2 with 1A items, and

so on so forth.

• Of each row, the i-th item is d(output)/d b(i) of the corresponding

node.

Suppose we have a 2-3-4 fully connected network, and a data

point X=(x1,x2) as input, and the output logits vector is of 4

dimension.

And the biases:

Then, the weight matrix looks like the following:

0

1

0

1

2

0

1

2

3

Input layer0 layer1 output layer2

Example

Weights,between,input,layer,

0,and,hidden,layer,1

1.0,,1.61,,0.74

1.0,,1.61,,1.74

Bias,for,hidden,layer,1

1.0,1.61,0.74

Weights,between,layer,1,

and,output,layer,2

0.01,#1.87,1.71,0.16

0.01,#1.23,1.12,0.1

0.0,#0.34,0.31,0.03

Bias,for,output,layer2

0.01,#0.99,0.9,0.08

• Your result will be

graded using the

script similar to

the one on the

right. If you run it

for verification for

‘b’, you would get

an output 781 .

• This script will be

upload along with

the other files.

• Do try running

your code with

this script to

adjust your

formatting.

Otherwise you are

likely to receive no

point.!

Script