## Description

Question 1. [25 points]

A single neuron receives input from m input neurons with weights wi

, where i ∈ [1 m].

The neuron is expected to predict the probability that the output t belongs to Class A

(t = 1) versus Class B (t = −1). A datasets of training samples are available with inputs

x

n and outputs y

n

(n ∈ [1 N]). You are told that the maximum a posteriori estimate for

the network weights are obtained by solving the following optimization problem:

arg min

W

X

n

(y

n − h(x

n

, W))2 + β

X

i

w

2

i

(1)

where W is the vector of weights wi

, β is a scalar constant, and h(.) is the output of the

neuron. According to this estimate, derive the prior probability distribution of the network

weights analytically.

Question 2. [30 points]

An engineer would like to design a neural network with a single hidden layer with four input

neurons (with binary inputs) and a single output neuron to implement:

(X1 OR NOT X2) XOR (NOT X3 OR NOT X4)

Assume a hidden layer with four hidden units, and a unipolar activation function (i.e., the

step function). Answer the questions below.

a) For each hidden unit, analyically derive the set of inequalities based on which a set of

weights and an activation threshold can be selected.

b) Choose a particular weight vector (including the bias term), and show that the designed

network achieves 100% performance in implementing the desired logic.

c) Now assume that the input data samples are subject to small random fluctuations due to

noise. Will the network you designed in part a function robustly under noisy conditions?

Find the set of weights and the activation threshold for the most robust decision boundary.

d) Generate 100 input samples by first concatenating 25 samples from each input vector.

Generate a random noise vector of length 2 for each training sample, assuming a zeromean Gaussian distribution with an std of 0.2. Form validation samples for testing the

NNs by linearly superposing the input samples and the random noise samples. Evaluate

the classification performance (i.e., percentage correct) of the networks designed in parts a

and c on the validation samples. Interpret your results.

Question 3. [45 points]

A researcher would like to process images of alphabet letters with a perceptron. A collection

of images were compiled for training and testing the perceptron. The file assign1_data1.mat

contains variables trainims (training images) and testims (testing images) along with the

ground truth labels in trainlbls and testlbls. Answer the questions below.

a) Visualize a sample image for each class. Find correlation coefficients between pairs of

sample images that you have selected. Display the correlations in matrix format. Discuss

the degree of within-class versus across-class variability.

b) Design a single-layer perceptron with an output neuron for each digit, using the training data. Set the initial network weights w and bias term b as random numbers drawn

from a Gaussian distribution N (0, 0.01), assume a sigmoid activation function. Your implementation should not train each output neuron separately, but a compound matrix W

and a compound vecor b should be defined and used to simultaneously update all connections. The online training algorithm should perform 10000 iterations. At each iteration, a

sample image should be randomly selected from the training data, the network should be

updated according to the gradient-descent learning rule, and W, b, and the mean-squared

error (MSE) should be recorded. Tune the learning rate η

∗

in order to minimize the final

value of the MSE. Display the final network weights for each digit as a separate image, and

describe the visual characteristics.

c) Now separately repeat the training process using a substantially higher and a subtantially

lower value thant η

∗

. On a single figure, plot the MSE curves (across all 10000 iterations)

for ηhigh, ηlow and η∗. Discuss your results.

d) Validate the performance of the trained networks using all samples in the test data.

Report the performance values for the three networks with ηhigh, ηlow and η∗.