# CSC411 Homework 3 solved

\$35.00

## Description

5/5 - (1 vote)

1. [3pts] Robust Regression. One problem with linear regression using squared error loss
is that it can be sensitive to outliers. Another loss function we could use is the Huber loss,
parameterized by a hyperparameter δ:
Lδ(y, t) = Hδ(y − t)
Hδ(a) = (
1
2
a
2
if |a| ≤ δ
δ(|a| − 1
2
δ) if |a| > δ
(a) [1pt] Sketch the Huber loss Lδ(y, t) and squared error loss LSE(y, t) = 1
2
(y − t)
2
for
t = 0, either by hand or using a plotting library. Based on your sketch, why would you
expect the Huber loss to be more robust to outliers?
(b) [1pt] Just as with linear regression, assume a linear model:
y = w>x + b.
Give formulas for the partial derivatives ∂Lδ/∂w and ∂Lδ/∂b. (We recommend you find
a formula for the derivative H0
δ
δ
(y − t).)
1
https://markus.teach.cs.toronto.edu/csc411-2018-09
2
http://www.cs.toronto.edu/~rgrosse/courses/csc411_f18/syllabus.pdf
3
http://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html
4
1
CSC411 Fall 2018 Homework 3
(c) [1pt] Write Python code to perform (full batch mode) gradient descent on this model.
Assume the training dataset is given as a design matrix X and target vector y. Initialize
w and b to all zeros. Your code should be vectorized, i.e. you should not have a for
loop over training examples or input dimensions. You may find the function np.where
2. [6pts] Locally Weighted Regression.
(a) [2pts] Given {(x
(1), y(1)), ..,(x
(N)
, y(N)
)} and positive weights a
(1), …, a(N)
show that
the solution to the weighted least squares problem
w∗ = arg min
1
2
X
N
i=1
a
(i)
(y
(i) − wT x
(i)
)
2 +
λ
2
||w||2
(1)
is given by the formula
w∗ =

XT AX + λI
−1 XT Ay (2)
where X is the design matrix (defined in class) and A is a diagonal matrix where Aii =
a
(i)
It may help you to review Section 3.1 of the csc321 notes5
.
(b) [2pts] Locally reweighted least squares combines ideas from k-NN and linear regression.
For each new test example x we compute distance-based weights for each training example a
(i) =
exp(−||x−x
(i)
||2/2τ
2
P
)
j
exp(−||x−x(j)
||2/2τ
2)
, computes w∗ = arg min 1
2
PN
i=1 a
(i)
(y
(i) − wT x
(i)
)
2 +
λ
2
||w||2 and predicts ˆy = x
T w∗
. Complete the implementation of locally reweighted least
squares by providing the missing parts for q2.py.
Important things to notice while implementing: First, do not invert any matrix, use
a linear solver (numpy.linalg.solve is one example). Second, notice that P
exp(Ai)
j
exp(Aj ) =
P
exp(Ai−B)
j
exp(Aj−B)
but if we use B = maxj Aj it is much more numerically stable as P
exp(Ai)
j
exp(Aj )
overflows/underflows easily. This is handled automatically in the scipy package with the
scipy.misc.logsumexp function6
.
(c) [1pt] Randomly hold out 30% of the dataset as a validation set. Compute the average
loss for different values of τ in the range [10,1000] on both the training set and the
validation set. Plot the training and validation losses as a function of τ (using a log
scale for τ ).
(d) [1pt] How would you expect this algorithm to behave as τ → ∞? When τ → 0? Is this
what actually happened?
5