EL9123 Homework 4: Logistic Regression solved

$35.00

Category: You will receive a download link of the .ZIP file upon Payment

Description

5/5 - (1 vote)

1. Suggest possible response variables and predictors for the following classification problems.
For each problem, indicate how many classes there are. There is no single correct answer.
(a) Given an audio sample, to detect the gender of the voice.
(b) A electronic writing pad records motion of a stylus and it is desired to determine which
letter or number was written. Assume a segmentation algorithm is already run which
indicates very reliably the beginning and end time of the writing of each character.
2. Suppose that a logistic regression model for a binary class label y = 0, 1 is given by
P(y = 1|x) = 1
1 + e−z
, z = β0 + β1×1 + β2×2,
where β = [1, 2, 3]T. Describe the following sets:
(a) The set of x such that P(y = 1|x) > P(y = 0|x).
(b) The set of x such that P(y = 1|x) > 0.8.
(c) The set of x1 such that P(y = 1|x) > 0.8 and x2 = 0.5.
3. A data scientist is hired by a political candidate to predict who will donate money. The data
scientist decides to use two predictors for each possible donor:
• x1 = the income of the person(in thousands of dollars), and
• x2 = the number of websites with similar political views as the candidate the person
follow on Facebook.
To train the model, the scientist tries to solicit donations from a randomly selected subset of
people and records who donates or not. She obtains the following data:
Income (thousands $), xi1 30 50 70 80 100
Num websites, xi2 0 1 1 2 1
Donate (1=yes or 0=no), yi 0 1 0 1 1
(a) Draw a scatter plot of the data labeling the two classes with different markers.
1
(b) Find a linear classifier that makes at most one error on the training data. The classifier
should be of the form,
yˆi =
(
1 if zi > 0
0 if zi < 0,
zi = wTxi + b.
What is the weight vector w and bias b in your classifier?
(c) Now consider a logistic model of the form,
P(yi = 1|xi) = 1
1 + e−zi
, zi = wTxi + b.
Using w and b from the previous part, which sample i is the least likely (i.e. P(yi
|xi) is
the smallest). If you do the calculations correctly, you should not need a calculator.
(d) Now consider a new set of parameters
w0 = αw, b0 = αb,
where α > 0 is a positive scalar. Would using the new parameters change the values ˆy in
part (b)? Would they change the likelihoods P(yi
|xi) in part (c)? If they do not change,
state why. If they do change, qualitatively describe the change as a function of α.
4. Suppose we collect data for a group of students in a machine learning class with variables
X1 = hours studied, X2 = undergrad GPA, and Y = receive an A. We fit a logistic regression
and produce estimated coefficient, β0 = −6, β1 = 0.05, β2 = 1.
(a) Estimate the probability that a student who studies for 40 h and has an undergrad GPA
of 3.5 gets an A in the class.
(b) How many hours would the student in part (a) need to study to have a 50 % chance of
getting an A in the class?
5. The loss function for logistic regression for binary classification is the binary cross entropy
defined as
J(β) = X
N
i=1
ln(1 + e
zi
) − yizi
where zi = β0 + β1x1i + β2x2i for two features x1,i and x2,i.
(a) What are the partial derivatives of zi with respect to β0, β1, and β2.
(b) Compute the partial derivatives of J(β) with respect to β0, β1, and β2. You should use
the chain rule of differentiation.
(c) Can you find the close form expressions for the optimal parameters βˆ
0, βˆ
1, and βˆ
2 by
putting the derivatives of J(β) to 0? What methods can be used to optimize the loss
function J(β)?
2