Description
Question 1. [25 points]
A single neuron receives input from m input neurons with weights wi
, where i ∈ [1 m].
The neuron is expected to predict the probability that the output t belongs to Class A
(t = 1) versus Class B (t = −1). A datasets of training samples are available with inputs
x
n and outputs y
n
(n ∈ [1 N]). You are told that the maximum a posteriori estimate for
the network weights are obtained by solving the following optimization problem:
arg min
W
X
n
(y
n − h(x
n
, W))2 + β
X
i
w
2
i
(1)
where W is the vector of weights wi
, β is a scalar constant, and h(.) is the output of the
neuron. According to this estimate, derive the prior probability distribution of the network
weights analytically.
Question 2. [30 points]
An engineer would like to design a neural network with a single hidden layer with four input
neurons (with binary inputs) and a single output neuron to implement:
(X1 OR NOT X2) XOR (NOT X3 OR NOT X4)
Assume a hidden layer with four hidden units, and a unipolar activation function (i.e., the
step function). Answer the questions below.
a) For each hidden unit, analyically derive the set of inequalities based on which a set of
weights and an activation threshold can be selected.
b) Choose a particular weight vector (including the bias term), and show that the designed
network achieves 100% performance in implementing the desired logic.
c) Now assume that the input data samples are subject to small random fluctuations due to
noise. Will the network you designed in part a function robustly under noisy conditions?
Find the set of weights and the activation threshold for the most robust decision boundary.
d) Generate 100 input samples by first concatenating 25 samples from each input vector.
Generate a random noise vector of length 2 for each training sample, assuming a zeromean Gaussian distribution with an std of 0.2. Form validation samples for testing the
NNs by linearly superposing the input samples and the random noise samples. Evaluate
the classification performance (i.e., percentage correct) of the networks designed in parts a
and c on the validation samples. Interpret your results.
Question 3. [45 points]
A researcher would like to process images of alphabet letters with a perceptron. A collection
of images were compiled for training and testing the perceptron. The file assign1_data1.mat
contains variables trainims (training images) and testims (testing images) along with the
ground truth labels in trainlbls and testlbls. Answer the questions below.
a) Visualize a sample image for each class. Find correlation coefficients between pairs of
sample images that you have selected. Display the correlations in matrix format. Discuss
the degree of within-class versus across-class variability.
b) Design a single-layer perceptron with an output neuron for each digit, using the training data. Set the initial network weights w and bias term b as random numbers drawn
from a Gaussian distribution N (0, 0.01), assume a sigmoid activation function. Your implementation should not train each output neuron separately, but a compound matrix W
and a compound vecor b should be defined and used to simultaneously update all connections. The online training algorithm should perform 10000 iterations. At each iteration, a
sample image should be randomly selected from the training data, the network should be
updated according to the gradient-descent learning rule, and W, b, and the mean-squared
error (MSE) should be recorded. Tune the learning rate η
∗
in order to minimize the final
value of the MSE. Display the final network weights for each digit as a separate image, and
describe the visual characteristics.
c) Now separately repeat the training process using a substantially higher and a subtantially
lower value thant η
∗
. On a single figure, plot the MSE curves (across all 10000 iterations)
for ηhigh, ηlow and η∗. Discuss your results.
d) Validate the performance of the trained networks using all samples in the test data.
Report the performance values for the three networks with ηhigh, ηlow and η∗.