Description
1. Let {W(t), 0 ≤ t ≤ 1} be a Brownian motion (Wiener process). The Brownian bridge
{B(t), 0 ≤ t ≤ 1} is the Brownian motion conditioned on W(1) = 0 and can be represented as B(t) = W(t) − tW(1). Derive the Karhunen–Loève representation of the
Brownian bridge B(t).
2. Simulate a sample of 50 realizations of a). the Brownian motion and b). the Brownian
bridge. Each curve should have 1000 support points. Show the trajectories on two
separate plots and include your R code.
3. Let X1, . . . , Xn be a sample of i.i.d. real-valued random variables sharing a distribution
with an unknown density f supported on a compact interval [a, b]. The kernel density
estimate (KDE) of f(x0) at x0 ∈ [a, b] is
ˆf(x0) = 1
n
Xn
i=1
Kh(Xi − x0),
where Kh(·) = h
−1K(·/h), K(·) is a kernel function, h > 0 is the bandwidth. Write
down an intuitive argument for why the KDE “works”. [Hint: consider a uniform kernel
K]
4. Analyze the Lake Acidity data in the gss package of R. The data were extracted from the
Eastern Lake Survey of 1984 conducted by the United States Environmental Protection
Agency, concerning 112 lakes in the Blue Ridge. To gain access to the data, type the
following commends in R:
l i b r a r y ( g s s )
data ( La keAcidi t y )
For more information check the help document about this data set.
5. Perform a nonparametric regression on the calcium concentration (Y) against surface ph
level (X).
1
(a) Show a KDE and a dot plot of the ph levels.
(b) Compare the results of local polynomial estimator, smoothing spline, regression
spline and penalized spline. Manually vary the tuning parameters, including bandwidth, the number of knots, and the penalty λ on the second derivative of the
regression curve. For each smoother identify a parameter setting that i). oversmooths (the estimate is too smooth), ii) undersmooths (the estimate is too rough),
and iii) smooths appropriately. Show the graphs and your code.
(c) Write a brief summary of your data analysis.
2