BDA3 Chapter 2 Exercise 20

Here’s my solution to exercise 20, chapter 2, of Gelman’s Bayesian Data Analysis (BDA), 3rd edition. There are solutions to some of the exercises on the book’s webpage.

Suppose yθExp(θ) with prior θGamma(α,β). If we observe that y100, then the posterior is

p(θy100)θα1eβθy=100θeθydy=θα1eβθ[(1)eθy]100=θα1eβθe100θ=θα1e(β+100)θ,

which is a Gamma(α,β+100) distribution. The posterior mean is αβ+100 and the posterior variance is α(β+100)2.

If instead we had observed y=100, the posterior would have been

p(θy=100)θα1eβθθeθ100=θαe(β+100)θ,

which is a Gamma(α+1,β+100) distribution. The posterior mean here is α+1β+100 and the posterior variance is α+1(β+100)2.

Both of these estimates are greater than in the case of observing y100. This is surprising because knowing that y=100 is more informative than just knowing y100. The reason there is actually less variance when y100 is that we get to average over all possibilities of y100, and the case y=100 has the greatest variance of all these possibilities.

We can illustrate this idea more formally using identity (2.8). In the context of this exercise, the identity can be written

V(θy100)E(V(θy)y100),

where all the probabilities are now conditional on y100. The left hand side is the posterior variance given y100, which we calculated above to be α(β+100)2. The quantity E(V(θy=100)y100) is just α+1(β+100)2 as shown above, which is greater than the LHS. However, this quantity is fundamentally different to what the right hand side of the identity expresses. The RHS actually evaluates to

100α(β+˜y)2p(˜yy100)d˜y.

That is, it averages over all possible realisations of the variance given y100. The inequality only guarantees that the posterior variance is on average smaller than the prior variance, so we can’t just plug in one value of interest.