STAT 435 Homework # 7 solved

$40.00

Category: You will receive a download link of the .ZIP file upon Payment

Description

5/5 - (1 vote)

1. For this problem, you will analyze a data set of your choice, not taken from the
ISLR package. Choose a data set that has n  p, since you will apply methods
from Chapter 7 to this data. You will also need to have p > 1. Throughout this
problem, make sure to label your axes appropriately, and to include legends
when needed.
(a) Describe the data in words. Where did you get it from, and what is
the data about? You will perform supervised learning on this data, so
you must identify a response, Y , and features, X1, . . . , Xp. What are the
values of n and p? Describe the response and the features (e.g. what are
they measuring; are they quantitative or qualitative?).
(b) Fit a generalized additive model, Y = f1(X1) + . . . + fp(Xp) + . Use
cross-validation to choose the level of complexity. For j = 1, . . . , p, make
a scatterplot of Xj against Y , and plot ˆfj (Xj ). Comment on your results
and on the choices you made in fitting this model.
(c) Now fit a linear model, Y = β0 + β1X1 + . . . + βpXp + . For j = 1, . . . , p,
display the linear fit (Xjβˆ
j ) on top of a scatterplot of Xj against Y .
(d) Estimate the test error of the generalized additive model and the test error
of the linear model. Comment on your results. Which approach gives a
better fit to the data?
2. In this problem, we’ll play around with regression splines.
(a) Generate data as follows:
1
set.seed(7)
x <- 1:1000 y <- sin((1:1000)/100)*4+rnorm(100) Consider the model Y = f(X) + . What is the form of f(X) for this simulation setting? What is the value of Var()? What is the value of E(Y − f(X))2 ? (b) Fit regression splines for various numbers of knots to this simulated data, in order to get spline fits ranging from very wiggly to very smooth. Make a plot of your results, showing the raw data, the true function f(X), and the spline fits. Be sure to include a legend containing relevant information, and to label the axes appropriately. (c) Based on visual inspection, how many knots seem to give the “best” fit? Explain your answer. (d) Now perform cross-validation in order to select the optimal number of knots. What is the “best” number of knots? Make a plot displaying the raw data, the true function f(X), and the spline fit ˆf(X) that uses the number of knots selected by cross-validation. Be sure to include a legend and to label the axes appropriately. Comment on your results. (e) Provide an estimate of the test error, E(Y − ˆf(X))2 , associated with the spline ˆf(·) from (d). How does this relate to your answer in (a)? (f) Now fit a linear model of the form Y = β0 + β1X +  to the data instead. Plot the raw data and the fitted model and the true function f(·). Provide an estimate of the test error associated with the fitted model. Comment on your results. 2