## Description

1. Chapter 12, problem 17

2. Chapter 12, problem 20

3. Chapter 20, problem 11

4. Chapter 20, problem 15

5. Consider a random vector (X, Y, Z), where Y and Z are marginally Bernoulli random variables such

that

P(Y = 1|X) = e

α+XT β

1 + e

α+XT β

,

P(Z = 1|X, Y ) = π(X)φ(Y ).

Show that

P(Y = 1|X, Z = 1) = e

α

∗+XT β

1 + e

α∗+XT β

,

and give the expression of α

∗

.

6. Consider a properly normalized density function f(x) on the real line and its natural exponential

family,

f(x|θ) = f(x)e

θx−ϕ(θ)

.

(a) What conditions do you need for f(x|θ) to be a well defined density function?

(b) What is the choice of ϕ(θ) such that f(x|θ) is a properly normalized density function, that is,

Z

f(x|θ)dx = 1.

(c) Show that with the choice of ϕ in (b) satisfies

ϕ

0

(θ) = Z

xf(x|θ)dx,

ϕ

00(θ) = Z

(x − ϕ

0

(θ))2

f(x|θ)dx,

where ϕ

0 and ϕ

00 are the first and second derivatives of ϕ. If ϕ

0

(θ) is a monotone strictly increasing

function, the variance is determined by its mean for natural exponential families.

(d) Consider the following dataset

Y X

0 −2

0 −1

1 1

1 2

and a generalized linear model

logit−1

{P (Y = 1|X)} = β1X.

1

Report the “maximum likelihood estimate” of β1 from computer using glm(Y∼X-1,binomial) (or

any other softwares you use). Does the estimate make sense to you? Can we obtain a reasonable

estiamte of P (Y = 1|X = 0.5) from the data? Why? Plot the log-likelihood as a function of β1.

Does the plot explain this phenomenon? This is one of the few situations when the MLE does

not exist. It happens sometimes especially when the sample size is small. Provide a sufficient and

necessary condition that the MLE of logistic regression exists.

This problem provide a warning for your future work. When the model becomes more complicated,

some parameters may not be estimable/identifiable for different reasons. The data may not have

enough information to identify the best model, which is our current situation. It can also be

caused by the existence of redundant parameters, that is f (x|θ1) = f (x|θ2), for some θ1 6= θ2.

You should pay particular attention especially to the first situation. The estimates provided by

software may not be meaningful and yield misleading results.

2