EE 5111: Estimation Mini Project 4 solved

$40.00

Category: You will receive a download link of the .ZIP file upon Payment

Description

5/5 - (1 vote)

The aim of this exercise is to study the importance of conjugate priors while performing
Bayesian estimation.
Consider the estimation of the covariance of a bivariate Gaussian distribution. We have access
to n observations yi ∼ N (0, Σ) for i = 1, …n. Here yi is a 2×1 vector; Σ is a 2×2 matrix. We
denote by ~y the set of all observations, yi. Perform the following experiments using Σ =

1 0
0 2
as the underlying covariance for n = 10, 100, 1000.
1. Estimate the covariance using Maximum Likelihood
2. Perform Bayesian estimation and provide a point estimate for the covariance. The conjugate prior distribution is the inverse Wishart distribution. The d-dimensional distribution
is given by
InvW ishart(ν, ∆) : p(Σ) ∝ |Σ|

ν+d+1
2 exp 

1
2
T r(∆Σ−1
)

(1)
Consider the following hyperparameters for the prior: ∆0 =

4 0
0 5
and ν0 = 5; here
d = 2. (Refer to Section 3.6 in Gelman)
The posterior density is of the same density with parameters
νn = ν0 + n (2)
∆n = ∆0 +
Xn
i=1
yiyi
T
(3)
3. Use of Non-informative prior:
As an alternative to the conjugate prior, use the non-informative Jeffrey’s prior given by
p(Σ) ∝ |Σ|
−2
, (4)
and the independence-Jeffreys prior given by
p(Σ) ∝ |Σ|
−3/2
. (5)
What are the differences in the inferences using non-informative priors as compared to
the conjugate prior?
1
4. Monte Carlo Bayesian estimation:
This method is useful when the posterior is not available in closed form. Note that we
require the mean of the posterior distribution.
p(Σ|~y) = p(~y|Σ)p(Σ)
R
p(~y|Σ)p(Σ)dΣ
(6)
EΣ|~y[Σ|~y] = EΣ[Σp(~y|Σ)]
EΣ[p(~y|Σ)] (7)
Note that the likelihood is
p(~y|Σ) ∝ det(Σ)
−n/2
exp

1
2
Xn
i=1
yi
T Σ−1yi
!
. (8)
Instead of using the closed form expression for the posterior update, find the posterior
using Monte Carlo integration using the following equation
A =
1
m
Pm
j=1 
Σjdet(Σj )
−n/2
exp 

1
2
Pn
i=1 yi
T Σ
−1
j
yi

1
m
Pm
j=1 
det(Σj )−n/2 exp 

1
2
Pn
i=1 yi
T Σ
−1
j
yi
 (9)
where each Σj ∼ p(Σ) ( a sample drawn from the prior distribution). Report the values
of A for n = 10, 100, 1000 and for m = 103
, 104
, 105
for p(Σ) ∼ InvW ishartν0
(∆0) for
the following parameters:
(a) ν0 = 5, ∆0 =

4 0
0 5
.
(b) ν0 = 5, ∆0 =

2 0
0 4
.
Which prior performs better? Why do you think it happens? Can you justify why
modeling the prior is important? Note that you can now model your prior distribution
as any non-conjugate distribution as well.
5. Hierarchical Bayes estimation and Gibbs sampling:
Consider the following formulation of the prior for covariance.
Σ ∼ InvW ishart 
ν + d − 1, 2νDiag(
1
a1
,
1
a2
)

(10)
ak ∼ InvGamma(
1
2
,
1
A2
k
) (11)
For performing Gibbs sampling, use the following equations to draw samples iteratively
from one distribution and use the drawn samples in the next equation:
p(Σ|y, a ¯ k) ∼ InvW ishart
ν + d + n − 1, 2ν

1/a1 0
0 1/a2

+
Xn
i=1
yiyi
T
!
(12)
p(ak|y, ¯ Σ) ∼ InvGamma 
ν + n
2
, ν(Σ−1
)kk +
1
A2
k

(13)
Use A1 = 0.05 and A2 = 0.05. Report the covariance estimate after 103
iterations of
Gibbs sampling.
2
6. Empirical Bayes:
For empirical Bayes, we consider an inverse Wishart prior
p(Σ) ∝ |Σ|

ν+d+1
2 exp 

1
2
T r(∆Σ−1
)

. (14)
However, instead of using a distribution over the parameters of the Wishart distribution,
the marginal likelihood is computed as follows:
p(¯y|ν, ∆) = Z
p(¯y|Σ)p(Σ|ν, ∆)d∆ (15)
The obtained p(¯y|ν, ∆) is then used to maximizing the log likelihood with respect to ν
and ∆ to obtain νopt and ∆opt,
∆opt =
ν
n
Xn
i=1
yiyi
T
(16)
νopt = arg max
ν

ν log 
ν + n
ν

+ n log 
ν + n
n

+ log Γ2(ν/2)
Γ2((ν + n)/2)
(17)
where Γd(a) is the multivariate gamma function,
Γd(a) = π
d(d−1)/4Y
d
i=1
Γ

a −
i − 1
2

. (18)
You may employ an iterative optimization algorithm to solve (17). The posterior is given
by,
p(Σ|y, ν ¯ opt, ∆opt) = InvW ishart(νopt + n, ∆opt +
Xn
i=1
yiyi
T
) (19)
Consider the following hyperparameters for the prior: ∆0 =

4 0
0 5
and ν0 = 5. You can
refer to https://emtiyaz.github.io/Writings/wishart.pdf for more details. Compare the estimate with the one obtained using the conjugate prior, non-informative prior
and the hierarchical Bayes method.
Questions:
• Which of the six methods listed above would you advocate for this problem and why?
• Here, we deal with a dimension of d = 2. For a problem of a higher dimension, which
method would you recommend? Justify.
3