## Description

1. Let {W(t), 0 ≤ t ≤ 1} be a Brownian motion (Wiener process). The Brownian bridge

{B(t), 0 ≤ t ≤ 1} is the Brownian motion conditioned on W(1) = 0 and can be represented as B(t) = W(t) − tW(1). Derive the Karhunen–Loève representation of the

Brownian bridge B(t).

2. Simulate a sample of 50 realizations of a). the Brownian motion and b). the Brownian

bridge. Each curve should have 1000 support points. Show the trajectories on two

separate plots and include your R code.

3. Let X1, . . . , Xn be a sample of i.i.d. real-valued random variables sharing a distribution

with an unknown density f supported on a compact interval [a, b]. The kernel density

estimate (KDE) of f(x0) at x0 ∈ [a, b] is

ˆf(x0) = 1

n

Xn

i=1

Kh(Xi − x0),

where Kh(·) = h

−1K(·/h), K(·) is a kernel function, h > 0 is the bandwidth. Write

down an intuitive argument for why the KDE “works”. [Hint: consider a uniform kernel

K]

4. Analyze the Lake Acidity data in the gss package of R. The data were extracted from the

Eastern Lake Survey of 1984 conducted by the United States Environmental Protection

Agency, concerning 112 lakes in the Blue Ridge. To gain access to the data, type the

following commends in R:

l i b r a r y ( g s s )

data ( La keAcidi t y )

For more information check the help document about this data set.

5. Perform a nonparametric regression on the calcium concentration (Y) against surface ph

level (X).

1

(a) Show a KDE and a dot plot of the ph levels.

(b) Compare the results of local polynomial estimator, smoothing spline, regression

spline and penalized spline. Manually vary the tuning parameters, including bandwidth, the number of knots, and the penalty λ on the second derivative of the

regression curve. For each smoother identify a parameter setting that i). oversmooths (the estimate is too smooth), ii) undersmooths (the estimate is too rough),

and iii) smooths appropriately. Show the graphs and your code.

(c) Write a brief summary of your data analysis.

2