Description
Problem 1 [25%]
Suppose that I collected data for a group of machine learning students from last year. For each student, I have a
feature X1 = hours studied for the class every week, X2 = overall GPA, and Y = whether the student receives
an A. We fit a logistic regression model and produce estimated coefficients, βˆ
0 = −6, βˆ
1 = −0.1, βˆ
2 = 1.0.
1. Estimate the probability of getting an A for a student who studies for 40h and has an undergrad GPA
of 2.0
2. By how much would the student in part 1 need to improve their GPA or adjust time studied to have a
90% chance of getting an A in the class? Is that likely?
Problem 2 [25%]
Consider a classification problem with two classes T (true) and F (false). Then, suppose that you have the
following four prediction models:
• T: The classifier predicts T for each instance (always)
• F: The classifier predicts F for each instance (always)
• C: The classifier predicts the correct label always (100% accuracy)
• W: The classifier predicts the wrong label always (0% accuracy)
You also have a test set with 60% instances labeled T and 40% instances labeled F. Now, compute the following
statistics for each one of your algorithms:
Statistic Cls. T Cls. F Cls. C Cls. W
recall
true positive rate
false positive rate
true negative rate
specificity
precision
Some of the rows above may be the same.
Problem O2 [30%]
This problem can be substituted for Problem 2 above, for up to 5 points extra credit. The better score from
problems 2 and O2 will be considered.
Solve Exercise 3.4 in [Bishop, C. M. (2006). Pattern Recognition and Machine Learning].
1
Problem 3 [25%]
In this problem, you will derive the bias-variance decomposition of MSE as described in Eq. (2.7) in ISL. Let
f be the true model, ˆf be the estimated model. Consider fixed instance x0 with the label y0 = f(x0). For
simplicity, assume that Var[] = 0, in which case the decomposition becomes:
E
h
(y0 − ˆf(x0))2
i
| {z }
test MSE
= Var[ ˆf(x0)]
| {z }
Variance
+
E[f(x0) − ˆf(x0)]
| {z }
Bias
2
.
Prove that this equality holds.
Hints:
1. You may find the following decomposition of variance helpful:
Var[W] = E
(W − E[W])2
= E
W2
− E[W]
2
2. This link could be useful: https://en.wikipedia.org/wiki/Variance#Basic_properties
Problem 4 [25%]
Please help me. I wrote the following code that computes the MSE, bias, and variance for a test point.
set.seed(1984)
population <- data.frame(year=seq(1790,1970,10),pop=c(uspop))
population.train <- population[1:nrow(population) - 1,]
population.test <- population[nrow(population),]
E <- c() # prediction errors of the different models
for(i in 1:10){
pop.lm <- lm(pop ~ year, data = dplyr::sample_n(population.train, 8))
e <- predict(pop.lm, population.test) - population.test$pop
E <- c(E,e)
}
cat(glue::glue("MSE: {mean(E^2)}\n",
"Bias^2: {mean(E)^2}\n",
"Var: {var(E)}\n",
"Bias^2+Var: {mean(E)^2 + var(E)}"))
## MSE: 2869.61343086216
## Bias^2: 2681.61281912074
## Var: 208.889568601581
## Bias^2+Var: 2890.50238772232
I expected that the MSE would be equal to Biasˆ2 + Variance, but that does not seem to be the case. The
MSE is 2402.515 and Biasˆ2 + Variance is 2428.706. Was my assumption wrong or is there a bug in my
code? Is it a problem that I am computing the expectation only over 10 trials?
Hint: If you are using Python and need help with this problem, please come to see me (Marek).
2