BDA3 Chapter 3 Exercise 11
Here’s my solution to exercise 11, chapter 3, of Gelman’s Bayesian Data Analysis (BDA), 3rd edition. There are solutions to some of the exercises on the book’s webpage.
We will analyse the data given in section 3.7 using different priors.
df <- read_csv('data/chapter_03_exercise_11.csv')
dose_log_g_ml | animals | deaths |
---|---|---|
-0.86 | 5 | 0 |
-0.30 | 5 | 1 |
-0.05 | 5 | 3 |
0.73 | 5 | 5 |
Here is the model specification.
\[ \begin{align} y_i \mid \theta_i &\sim \dbinomial(n_i, \theta_i) \\ \logit(\theta_i) &= \alpha + \beta x_i \\ \alpha &\sim \dnorm(0, 2^2) \\ \beta &\sim \dnorm(10, 10^2) \end{align} \]
We won’t use a grid approximation to the posterior but instead just use Stan because it is a lot simpler.
m <- rstanarm::stan_glm(
cbind(deaths, animals - deaths) ~ 1 + dose_log_g_ml,
family = binomial(link = logit),
data = df,
prior_intercept = normal(0, 2),
prior = normal(10, 10),
warmup = 500,
iter = 4000
)
summary(m)
Model Info:
function: stan_glm
family: binomial [logit]
formula: cbind(deaths, animals - deaths) ~ 1 + dose_log_g_ml
algorithm: sampling
priors: see help('prior_summary')
sample: 14000 (posterior sample size)
observations: 4
predictors: 2
Estimates:
mean sd 2.5% 25% 50% 75% 97.5%
(Intercept) 1.3 1.0 -0.5 0.6 1.2 1.9 3.4
dose_log_g_ml 11.0 5.0 3.5 7.3 10.3 13.9 22.5
mean_PPD 2.3 0.4 1.5 2.0 2.2 2.5 3.2
log-posterior -5.6 1.1 -8.4 -6.0 -5.2 -4.8 -4.5
Diagnostics:
mcse Rhat n_eff
(Intercept) 0.0 1.0 6103
dose_log_g_ml 0.1 1.0 4788
mean_PPD 0.0 1.0 9575
log-posterior 0.0 1.0 3826
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
The tidybayes package offers convenient functions for drawing from the posterior. We’ll also add in our LD50
estimate.
draws <- m %>%
tidybayes::spread_draws(`(Intercept)`, dose_log_g_ml) %>%
rename(
alpha = `(Intercept)`,
beta = dose_log_g_ml
) %>%
mutate(LD50 = -alpha / beta)
The estimates look much the same with the more informative priors as with the uninformative priors. The posterior probability that \(\beta > 0\) is:
draws %>%
mutate(positive = beta > 0) %>%
summarise(mean(positive)) %>%
pull() %>%
percent()
[1] "100%"
The posterior LD50 estimate (conditional on \(\beta > 0\)) is as follows:
draws %>%
filter(beta > 0) %>%
ggplot() +
aes(LD50) +
geom_histogram(bins = 50) +
geom_vline(xintercept = 0, linetype = 'dashed', colour = 'chocolate') +
labs(
y = 'Count',
title = 'Histogram of posterior LD50 estimate',
subtitle = 'Conditional on β > 0'
)