## Description

1. (10 points) Exercise 3.6(page 92) in LFD.

2. (10 points) Exercise 3.7 (page 92) in LFD.

3. (20 points) Recall the objective function for linear regression can be expressed as

E(w) = 1

N

kXw − yk

2

,

as in Equation (3.3) of LFD. Minimizing this function with respect to w leads to the optimal

w as (XT X)

−1XT y. This solution holds only when XT X is nonsingular. To overcome this

problem, the following objective function is commonly minimized instead:

E2(w) = kXw − yk

2 + λkwk

2

,

where λ > 0 is a user-specified parameter. Please do the following:

(a) (10 points) Derive the optimal w that minimize E2(w).

(b) (10 points) Explain how this new objective function can overcome the singularity problem

of XT X.

4. (35 points) In logistic regression, the objective function can be written as

E(w) = 1

N

X

N

n=1

ln

1 + e

−ynwT xn

.

Please

(a) (10 points) Compute the first-order derivative ∇E(w). You will need to provide the

intermediate steps of derivation.

(b) (10 points) Once the optimal w is obtain, it will be used to make predictions as follows:

Predicted class of x =

(

1 if θ(w

T x) ≥ 0.5

−1 if θ(w

T x) < 0.5

where the function θ(z) = 1

1+e−z looks like

1

Explain why the decision boundary of logistic regression is still linear, though the linear signal w

T x is passed through a nonlinear function θ to compute the outcome of

prediction.

(c) (5 points) Is the decision boundary still linear if the prediction rule is changed to the

following? Justify briefly.

Predicted class of x =

(

1 if θ(w

T x) ≥ 0.9

−1 if θ(w

T x) < 0.9

(d) (10 points) In light of your answers to the above two questions, what is the essential

property of logistic regression that results in the linear decision boundary?

5. (35 points) Logistic Regression for Handwritten Digits Recognition: Implement logistic regression for classification using gradient descent to find the best separator. The

handwritten digits files are in the “data” folder: train.txt and test.txt. The starting code is

in the “code” folder. In the data file, each row is a data example. The first entry is the digit

label (“1” or “5”), and the next 256 are grayscale values between -1 and 1. The 256 pixels

correspond to a 16 × 16 image. You are expected to implement your solution based on the

given codes. The only file you need to modify is the “solution.py” file. You can test your

solution by running “main.py” file. Note that code is provided to compute a two-dimensional

feature (symmetry and average intensity) from each digit image; that is, each digit image is

represented by a two-dimensional vector before being augmented with a “1” to form a threedimensional vector as discussed in class. These features along with the corresponding labels

should serve as inputs to your logistic regresion algorithm.

(a) (15 points) Complete the logistic regression() function for classifying digits number

“1” and “5”.

(b) (5 points) Complete the accuracy() function for measuring the classification accuracy

on your training and test data.

(c) (5 points) Complete the thirdorder() function to transfer the features into 3rd order

polynomial Z-space.

(d) (10 points) Run “main.py” to see the classify results. As your final deliverable to a

customer, would you use the linear model with or without the 3rd order polynomial

transform? Briefly explain your reasoning.

Deliverable: You should submit (1) a hard-copy report (along with your write-up for other

questions) that summarizes your results before the lecture and (2) the “solution.py” file to

the BeachBoard.

Note: Please read the “Readme.txt” file carefully before you start this assignment. Please

do NOT change anything in the “main.py” and “helper.py” files when you program.

2