STAT 435 Homework # 4 solved

$40.00

Category: You will receive a download link of the .ZIP file upon Payment

Description

5/5 - (1 vote)

1. Consider the validation set approach, with a 50/50 split into training and
validation sets:
(a) Suppose you perform the validation set approach twice, each time with a
different random seed. What’s the probability that an observation, chosen
at random, is in both of those training sets?
(b) If you perform the validation set approach repeatedly, will you get the
same result each time? Explain your answer.
2. Consider K-fold cross-validation:
(a) Consider the observations in the 1st fold’s training set, and the observations in the 2nd fold’s training set. What’s the probability that an
observation, chosen at random, is in both of those training sets?
(b) If you perform K-fold CV repeatedly, will you get the same result each
time? Explain your answer.
3. Now consider leave-one-out cross-validation:
(a) Consider the observations in the 1st fold’s training set, and the observations in the 2nd fold’s training set. What’s the probability that an
observation, chosen at random, is in both of those training sets?
(b) If you perform leave-one-out cross-validation repeatedly, will you get the
same result each time? Explain your answer.
1
4. Consider a very simple model,
Y = β + ,
where Y is a scalar response variable, β ∈ R is an unknown parameter, and 
is a noise term with E() = 0, V ar() = σ
2
. Our goal is to estimate β. Assume
that we have n observations with uncorrelated errors.
(a) Suppose that we perform least squares regression using all n observations.
Prove that the least squares estimator, βˆ, equals 1
n
Pn
i=1 Yi
.
(b) Suppose that we perform least squares using all n observations. Prove
that the least squares estimator, βˆ, has variance σ
2/n.
(c) Consider the least squares estimator of β fit using just n/2 observations.
What is the variance of this estimator?
(d) Consider the least squares estimator of β fit using n(K − 1)/K observations, for some K > 2. What is the variance of this estimator?
(e) Consider the least squares estimator of β fit using n − 1 observations.
What is the variance of this estimator?
(f) Derive an expression for E(βˆ), where βˆ is the least squares estimator fit
using all n observations.
(g) Using your results from the earlier sections of this question, argue that the
validation set approach tends to over -estimate the expected test error.
(h) Using your results from the earlier sections of this question, argue that
leave-one-out cross-validation does not substantially over-estimate the expected test error, provided that n is large.
(i) Using your results from the earlier sections of this question, argue that
K-fold CV provides an over-estimate of the expected test error that is
somewhere between the big over-estimate resulting from the validation
set approach and the very mild over-estimate resulting from leave-one-out
CV.
5. As in the previous problem, assume
Y = β + ,
where Y is a scalar response variable, β ∈ R is an unknown parameter, and 
is a noise term with E() = 0, V ar() = σ
2
. Our goal is to estimate β. Assume
that we have n observations with uncorrelated errors.
(a) Suppose that we perform K-fold cross-validation. What is the correlation
between βˆ1
, the least squares estimator of β that we obtain from the 1st
fold, and βˆ2
, the least squares estimator of β that we obtain from the 2nd
fold?
2
(b) Suppose that we perform the validation set approach twice, each time
using a different random seed. Assume further that exactly 0.25n observations overlap between the two training sets. What is the correlation
between βˆ1
, the least squares estimator of β that we obtain the first time
that we perform the validation set approach, and βˆ2
, the least squares estimator of β that we obtain the second time that we perform the validation
set approach?
(c) Now suppose that we perform leave-one-out cross-validation. What is the
correlation between βˆ1
, the least squares estimator of βˆ that we obtain
from the 1st fold, and βˆ2
, the least squares estimator of β that we obtain
from the 2nd fold?
Remark 1: Problem 5 indicates that the βˆ’s that you estimate using LOOCV
are very correlated with each other.
Remark 2: You might remember from an earlier stats class that if X1, . . . , Xn
are uncorrelated with variance σ
2 and mean µ, then the variance of 1
n
Pn
i=1 Xi
equals σ
2/n. But if Cor(Xi
, Xk) = σ
2
, then the variance of 1
n
Pn
i=1 Xi is quite
a bit higher.
Remark 3: Together, problems 4 and 5 might give you some intuition for the
following: LOOCV results in an approximately unbiased estimator of expected
test error (if n is large), but this estimator has high variance. In contrast, Kfold CV results in an estimator of expected test error that has higher bias, but
lower variance.
3