1. Perceptron Algorithm. Suppose we have the following training examples in a 2-dimensional
input space, and no bias term. Following our discussion of the perceptron algorithm, we use
t = −1 rather than t = 0 to denote the negative class.
x1 x2 t
1 -2 1
0 -1 -1
Recall the perceptron algorithm from lecture. It repeatedly applies the following procedure,
stopping when the weights are strictly within the feasible region (i.e. not on the boundary):
For each training case (x
(i) ← wT x
(i) ≤ 0,
w ← w + t
Here we work through the algorithm by hand.
(a) [2pts] Sketch out the problem in weight space. In particular: draw and label the axes,
draw the half-space corresponding to each of the two training examples, and shade the
feasible region. (Remember that there is no bias term.)
(b) [2pts] Simulate the perceptron algorithm by hand, starting from the initial weights
w1 = 0, w2 = −2. On the plot from part (a), mark an X on every point in weight space
visited by the algorithm until it stops. You do not need to provide anything for this part
other than the X’s on the plot. Hint: the weights are updated 12 times. It will be tedious
if you compute all the updates using arithmetic; instead, plot the first few updates as you
go along, and you should start to notice a visual pattern.
CSC321 Homework 2
2. Feature Maps. Suppose we have the following 1-D dataset for binary classification:
(a) [1pt] Argue briefly (at most a few sentences) that this dataset is not linearly separable.
Your answer should mention convexity.
Now suppose we apply the feature map
Assume we have no bias term, so that the parameters are w1 and w2.
(b) [2pts] Write down the constraint on w1 and w2 corresponding to each training example, and then find a pair of values (w1, w2) that correctly classifies all the examples.
Remember that there is no bias term.
3. Loss Functions. [3pts] Suppose we have a prediction problem where the target t corresponds to an angle, measured in radians. A reasonable loss function we might use is
L(y, t) = 1 − cos(y − t).
Suppose we make predictions using a linear model,
y = w>x + b.
As usual, the cost is the average loss over the training set:
Derive a sequence of vectorized mathematical expressions for the gradients of the cost with
respect to w and b. As usual, the inputs are organized into a design matrix X with one row
per training example. The expressions should be something you can translate into a Python
program without requiring a for-loop. Your answer should look like:
y = · · ·
= · · ·
= · · ·
∂b = · · ·
You can use sin(A) to denote the sin function applied elementwise to A. Remember that
∂E/∂w denotes the gradient vector,