BDA3 Chapter 5 Exercise 8
Here’s my solution to exercise 8, chapter 5, of Gelman’s Bayesian Data Analysis (BDA), 3rd edition. There are solutions to some of the exercises on the book’s webpage.
Let pm(θ), m=1,…,M, be conjugate prior densities for the likelihood y∣θ, and λm∈[0,1] such that ∑M1λm=1. We show that p(θ):=∑Mm=1λmpm(θ) is also a conjugate prior. This follows from the calculation of the posterior:
p(θ∣y)∝p(y∣θ)p(θ)=p(y∣θ)M∑1λmpm(θ)=M∑1λmp(y∣θ)pm(θ)=M∑1λmpm(y)pm(θ∣y)=M∑1λ′mpm(θ∣y),
where λ′m:=λmpm(y). Since each term pm(θ∣y) has the same parametric form as pm(θ), the posterior has the same parametric form as the prior, ∑M1λmpm(θ).
To apply this to a concrete example, suppose y∣θ∼Normal(θ,1), where θ is mostl likely near 1 with standard deviation of 0.5 but has some probability of being near -1, still with standard deviation 0.5. Let’s calculate the posterior after 10 observations with mean -0.25.
mu_pos <- 1
sd_pos <- 0.5
lambda_pos <- 0.8
mu_neg <- -1
sd_neg <- 0.5
lambda_neg <- 1 - lambda_pos
prior <- tibble(
theta = seq(-3, 3, 0.01),
density_pos = dnorm(theta, mu_pos, sd_pos),
density_neg = dnorm(theta, mu_neg, sd_neg),
density = lambda_pos * density_pos + lambda_neg * density_neg
)
The posterior for each mixture component can be calculated using equation 2.12 (page 42). In particular,
p±(θ∣ˉy)=Normal(θ∣μ±10,σ±10),
where
n <- 10
ybar <- -0.25
sigma <- 1
sd_10_pos <- 1 / sqrt(1 / sd_pos^2 + n / sigma^2)
mu_10_pos <- (mu_pos / sd_pos^2 + n * ybar / sigma^2) / (1 / sd_10_pos^2)
sd_10_neg <- 1 / sqrt(1 / sd_neg^2 + n / sigma^2)
mu_10_neg <- (mu_neg / sd_neg^2 + n * ybar / sigma^2) / (1 / sd_10_neg^2)
This gives us positive and negative means of 0.107, -0.464, respectively, and stanard deviations both equal to 0.267.
In order to find the full posterior distribution, we also need the posterior mixture proportions. These are given by
py_pos <- dnorm(ybar, mu_pos, sqrt(sd_pos^2 + sigma^2 / n))
lambda_prime_pos0 <- lambda_pos * py_pos
py_neg <- dnorm(ybar, mu_neg, sqrt(sd_neg^2 + sigma^2 / n))
lambda_prime_neg0 <- lambda_neg * py_neg
normaliser <- lambda_prime_pos0 + lambda_prime_neg0
lambda_prime_pos <- lambda_prime_pos0 / normaliser
lambda_prime_neg <- lambda_prime_neg0 / normaliser
This gives positive and negative weights of 48.94%, 51.06%, respectively.
We’ll calculate the normalised posterior density on a grid between [-3, 3].
granularity <- 0.01
posterior <- tibble(
theta = seq(-3, 3, granularity),
density_pos = dnorm(theta, mu_10_pos, sd_10_pos),
density_neg = dnorm(theta, mu_10_neg, sd_10_neg),
density_unnormalised = lambda_prime_pos * density_pos + lambda_prime_neg * density_neg,
density = density_unnormalised / sum(density_unnormalised * granularity)
)
This posterior looks like a ‘hunchback normal’ distribution, with the two modes being much less distinct.