Description
Problem 1 Linear Regression (30 points)
You are asked to implement 4 python functions for linear regression. The input and output of the functions
are specified in linear regression.py. You should be able to run linear regression.py after you
finish the implementation. Note that we have already appended the column of 1’s to the feature matrix, so
that you do not need to modify the data yourself.
Problem 1.1 Linear regression (6 points)
Implement linear regression and return the model parameters. What to submit: fill in the function lin
ear regression noreg(X, y).
Problem 1.2 Regularized linear regression (10 points)
To prevent overfitting, we usually add regularization. For now, we will focus on L2 regularization. In
this case, the optimization problem is:
wλ = arg min
w
||Xw − y||2
2 + λ||w||2
2
(1)
where λ ≥ 0 is a hyper-parameter used to control the complexity of the resulting model. When λ = 0,
the model reduces to the usual (unregularized) linear regression. For λ > 0 the objective function balances
between two terms: (1) the data-dependent quadratic loss function ||Xw − y||2
2
, and (2) a function of the
model parameters ||w||2
2
.
Implement your regularized linear regression algorithm.
What to submit: fill in function regularized linear regression(X, y, λ).
Problem 1.3 Tuning the regularization hyper-parameter (9 points)
Use the validation set to tune the regularization parameter λ ∈ {0, 10−4
, 10−3
, 10−2
, 10−1
, 1, 10, 102}. We
select the best one that results in the lowest mean square error on the validation set.
What to submit: fill in the function tune lambda(Xtrain, ytrain, Xval, yval, lambds).
Problem 1.4 Mean square error (5 points)
Report the mean square error of the model on the give n test set.
What to submit: fill in the function test error(w, X, y).
Problem 2 k Nearest Neighbors
(20 points)
Review In the lecture, we define the classification rule of the k-nearest neighbors (kNN) algorithm for an
input example x as
vc = ∑
xi∈knn(x)
1(yi == c), ∀c ∈ [C] (2)
y = arg max
c∈[C]
vc (3)
where [C] is the set of classes.
3
A common distance measure between two samples xi and xj
is the Euclidean distance:
d(xi
, xj) = kxi − xjk2 =
r
∑
d
(xid − djd)
2
. (4)
You are asked to implement 4 python functions for kNN. The input and output are specified in knn.py.
You should be able to run knn.py after you finish the implementation.
Problem 2.1 Distance calculation (3 points)
Compute the distance between test data points in X and training data points in Xtrain based on Eqn. 4.
What to submit: fill in the function compute distances(Xtrain, X).
Problem 2.2 kNN classifier (5 points)
Implement kNN classifier based on Eqn. 3. Your algorithm should output the predictions for the test
set. Important: You do not need to worry about ties in distance when finding the k nearest neighbor set.
However, when there are ties in the majority label of the k nearest neighbor set, you should return the label
with the smallest index. For example, when k = 5, if the labels of the 5 nearest neighbors happen to be
1, 1, 2, 2, 7, your prediction should be the digit 1.
What to submit: fill in the function predict labels(k, ytrain, dists).
Problem 2.3 Report the accuracy (6 points)
The classification accuracy is defined as:
accuracy =
# of correctly classified test examples
# of test examples (5)
The accuracy value should be in the range of
0, 1
What to submit: fill in the code for function compute accuracy(y, ypred).
Problem 2.4 Tuning k (6 points)
Find k among
1, 3, 5, 7, 9
that gives the best classification accuracy on the validation set.
What to submit: fill in the code for function find best k(K, ytrain, dists, yval).