# EE 559 Homework 4 solved

\$35.00

## Description

5/5 - (1 vote)

1. Code up a 2-class perceptron classifier, for the case of 2 features (3 dimensions in augmented
space). As in the previous homework problems, you are expected to code this up yourself
rather than use a library or toolbox, so the same guidelines apply.
Hint: One way to randomly shuffle the data points, is to generate random permutations of
numbers from 1 to N, where N = number of training samples. The result can be used as indices
for the data points and their labels.
It is recommended (but not required) that you use augmented space, and reflect the training
data. Do not reflect the test data.
Because the data you will use might not be linearly separable, you will need two halting
conditions:
(1) The usual perceptron halting condition: if there have been no updates to in 1 epoch,
then halt;
(2) An ad hoc one in case condition (1) is not satisfied. A reasonable one is to set a
maximum number of iterations or epochs (1000 epochs should be high enough), and
during the last full epoch, at each iteration i, evaluate over all training data
points, and store each . Then, after the last iteration, choose the that had
the smallest criterion value .
For your initial weight vector, use (vector of 0.1 in each component).
For your learning rate parameter, use
Run your classifier on 3 datasets: Synthetic1 (get from HW1 folder), Synthetic2 (also from HW1
folder), and Synthetic3 (from the current HW folder).
(a) For each dataset, report on the final weight vector, the classification error rate on the training
set, and the classification error rate on the test set.
(b) Also for each dataset, give a 2D plot showing decision boundary and regions, as well as
training data points (similar to plots of HW1). You may find it convenient to use the
PlotDecBoundaries function from HW1 folder.
(c) For Synthetic1 and Synthetic2 datasets, compare your error rates of part (a) with the error rates
obtained in HW1(a); you can compare with the posted solutions if your HW1 answer wasn’t
correct. Explain any significant differences between perceptron result and nearest means
result.
Problem 2 on next page…
w
J (w(i))
J (w(i)) w(i)
J (w(i))
w(0) = 0.1(1)
η(i) = 1.
p. 2 of 2
2. In a gradient descent procedure, let the “weight update” be defined as
.
Consider a given weight vector which is the weight vector at (given) iteration i , so that
.
Let the criterion function take the form , in which is the criterion
function for one data point. In this problem, do not plug in for the perceptron criterion
function; please solve the problem for the general case of . Assume
.
(a) For stochastic gradient descent, variant 2, find , in which denotes
expected value, and for each probabilistic experiment, n is randomly chosen from
.
(b) Compare your result of (a) with for batch gradient descent. Describe in words
how the batch GD update compares with the expected value of the stochastic GD
variant 2 update.
Explain why the comparison between these two expressions makes sense, or does not
make sense.
Δw(i) = w(i +1) − w(i)
w0
w(i) = w0
J (w) = Jn (w) n=1
N
∑ Jn (w)
J and Jn
η(i) = η = constant
E{Δw(i)} E{ }
{1,2,!,N}
∇wJ w0 ( )
Δw(i)