BDA3 Chapter 2 Exercise 20
Here’s my solution to exercise 20, chapter 2, of Gelman’s Bayesian Data Analysis (BDA), 3rd edition. There are solutions to some of the exercises on the book’s webpage.
Suppose y∣θ∼Exp(θ) with prior θ∼Gamma(α,β). If we observe that y≥100, then the posterior is
p(θ∣y≥100)∝θα−1e−βθ∫∞y=100θe−θydy=θα−1e−βθ[(−1)e−θy]∞100=θα−1e−βθe−100θ=θα−1e−(β+100)θ,
which is a Gamma(α,β+100) distribution. The posterior mean is αβ+100 and the posterior variance is α(β+100)2.
If instead we had observed y=100, the posterior would have been
p(θ∣y=100)∝θα−1e−βθθe−θ100=θαe−(β+100)θ,
which is a Gamma(α+1,β+100) distribution. The posterior mean here is α+1β+100 and the posterior variance is α+1(β+100)2.
Both of these estimates are greater than in the case of observing y≥100. This is surprising because knowing that y=100 is more informative than just knowing y≥100. The reason there is actually less variance when y≥100 is that we get to average over all possibilities of y≥100, and the case y=100 has the greatest variance of all these possibilities.
We can illustrate this idea more formally using identity (2.8). In the context of this exercise, the identity can be written
V(θ∣y≥100)≥E(V(θ∣y)∣y≥100),
where all the probabilities are now conditional on y≥100. The left hand side is the posterior variance given y≥100, which we calculated above to be α(β+100)2. The quantity E(V(θ∣y=100)∣y≥100) is just α+1(β+100)2 as shown above, which is greater than the LHS. However, this quantity is fundamentally different to what the right hand side of the identity expresses. The RHS actually evaluates to
∫∞100α(β+˜y)2p(˜y∣y≥100)d˜y.
That is, it averages over all possible realisations of the variance given y≥100. The inequality only guarantees that the posterior variance is on average smaller than the prior variance, so we can’t just plug in one value of interest.