# CS5242 Assignment 01 solved

\$40.00 \$20.00

Category:

## Description

Question #1
Given a data set that represents a function that takes in
14 binary digits and output one of four numbers, 0,1,2,3.
Construct a neural network to train for this function.
Neural
network
one
hot
vector
compare
with labels
Question #2
!: −1, 1 &’ → {0, 1, 2, 3}
! .&, … , .&’ ∈ {0, 1, 2, 3}
The label vectors are one-hot vectors such that
For this assignment, construct three networks given
in the next slide. Implement the back propagation
algorithm and use gradient descend to train the
networks. Use cross entropy cost function on a
softmax function for training.
Question #2
14 nodes input
Fully connected
100 nodes hidden layer
Fully connected
40 nodes hidden layer
Fully connected
4 nodes output
14 nodes input
Fully connected
28 nodes hidden layer
Fully connected
4 nodes output
28 nodes hidden layer
6 identical fully
connected hidden layers
with 28 nodes each
14 nodes input
Fully connected
14 nodes hidden layer
Fully connected
4 nodes output
14 nodes hidden layer
28 identical fully
connected hidden layers
with 14 nodes each
14-100-40-4 net 14-28×6-4 net 14-14×28-4 net
Question #2
All but the last Fully connected layers have ReLU as the activation function
(1) Plot the training cost w.r.t. iterations
(2) Plot the testing cost w.r.t. iterations
(3) Plot the train & test accuracy scores w.r.t. iterations
(4) Check your back propagation intermediate results against known
answers (see page 8 to 13 for details):
a. You will be given one special data point d.
b. You will be given one weight & bias set W0 together with correct
gradients computed using data point d and W0.
c. You will be given another weight & bias set W1 with no
d. Use (a) and (b) to compute gradients and compare to the given
a script. Be sure to format your output according to
instructions in the next slide. Incorrect format will be
Question #2
Note: Output
here means the
loss function
Question #2
Give a half page discussion on why the three
networks 14-100-40-4 net,14-28×6-4 net,14-14×28-4
net perform differently.
Which one performs better and why.
For Question#2(1)(2)(3)
• The data given to you is under path Question2_123.
• It contains 4 csv files, (i.e. x_train, x_test, y_train, y_test). In the x_* files, each line is
a datapoint of 14 dimensions, where in the y_* files, each line is the respective
labels which are corresponding to the ones in x_*.
For Question#2(4)
• The data given to you is under path Question2_4
• You will be given weights and gradients for verification, which is under the ‘b’ folder,
where the weights you will work on is in the ‘c’ folder. To ease your understanding, in
the weights csv, there’s one heading column introducing the weights/biases, while
there is NO such column for the gradients. And you should NOT include the
• The given data point x and label y is in the ‘a.txt’ file. For both verification and test,
we use the same data point and label.
• Your submission should be the same format as the given ‘true-d*’ files.
• If possible, use np.float32 to control the granularity of your gradients, otherwise,
round your results to at least 16 digits, or e-16.
Given files
• Your submission is a single .zip file. Other compressions are NOT acceptable.
• The naming of the .zip file is your ID as shown on the IVLE/class. We will use it for
grading. For most cases, it’s your NUS NETID, which will be e******.zip (‘e’ in
lowercase).
• Inside the .zip file, there are 8 files. They are 1 pdf, 6 csv and 1 folder containing
• For the essay part of question1 and question 2. You will submit a single pdf file.
• For question2(4), the output files are csv files, with comma (i.e.’,’) as the delimiter.
Using space or other delimiters are NOT acceptable.
• Csv namings are as follow: dw#100#40#4.csv,, db#100#40#4.csv,, dw#28#6#
4.csv,,db#28#6#4.csv,, dw#14#28#4.csv,, db#14#28#4.csv,which
correspond to the gradients of weights and biases for the three network
configurations: 14-100-40-4, 14-28*6-4, 14-14*28-4. Other naming are NOT
acceptable. (The naming has changed to be compatible with win & *nix)
• Since this task you are not expected to work with platforms, so aside from the 6
folder and compress along with the other files
Submission
0
1
0
1
2
0
1
2
3
4
0
1
2
3
Input layer0 layer1 layer2 output layer3
• Suppose layer(t) has 12 number of nodes, so the weights
from layer(t) to layer(t+1) form a 12 by 123&matrix, where the
(i, j)-th entry of this matrix represents the weight connecting
the i-th node of layer(t) to the j-th node of layer(t+1).
• The bias then is simply a length-12 vector as for layer(t).
Noted that the input layer 0 has no corresponding bias.
• Softmax is not considered to be a layer in this context, so the
output layer output logits.
More Details for Question#2(4)
The given weights and corresponding gradients output file is of the
following format:
• Totally ∑ 12
# 789:;<=>?&
2 rows.
• The first 1@ rows are the matrix from input layer0 to hidden
layer1, the following 1& rows are the matrix from layer1 to layer2,
and so on so forth. There’s NO blank line between matrix(t) and
matrix(t+1)
• Then of each matrix, the (i, j) item is the derivative, which is d
(output)/ d w(i,j) of the corresponding node.
The given biases and corresponding gradients output file is similar:
• Totally ∑ 1 # 789:;<=>?&
2 rows.
• The first row is the bias vector for layer1, and so has 1& items.
The second line is the bias vector for layer2 with 1A items, and
so on so forth.
• Of each row, the i-th item is d(output)/d b(i) of the corresponding
node.
Suppose we have a 2-3-4 fully connected network, and a data
point X=(x1,x2) as input, and the output logits vector is of 4
dimension.
And the biases:
Then, the weight matrix looks like the following:
0
1
0
1
2
0
1
2
3
Input layer0 layer1 output layer2
Example
Weights,between,input,layer,
0,and,hidden,layer,1
1.0,,1.61,,0.74
1.0,,1.61,,1.74
Bias,for,hidden,layer,1
1.0,1.61,0.74
Weights,between,layer,1,
and,output,layer,2
0.01,#1.87,1.71,0.16
0.01,#1.23,1.12,0.1
0.0,#0.34,0.31,0.03
Bias,for,output,layer2
0.01,#0.99,0.9,0.08
script similar to
the one on the
right. If you run it
for verification for
‘b’, you would get
an output 781 .
• This script will be