BDA3 Chapter 5 Exercise 15
Here’s my solution to exercise 15, chapter 5, of Gelman’s Bayesian Data Analysis (BDA), 3rd edition. There are solutions to some of the exercises on the book’s webpage.
The data provided are in an awkward format. I’ve downloaded it with minor modifications to make it easier to parse.
df <- read_delim(
'data/chapter_05_exercise_15_table_5.4.txt',
delim=' ',
skip=4,
trim_ws=TRUE
) %>%
transmute(
study,
y = log(treated.deaths / (treated.total - treated.deaths)) - log(control.deaths / (control.total - control.deaths)),
sigma2 = (1 / treated.deaths) + (1 / (treated.total - treated.deaths)) + (1 / control.deaths) + (1 / (control.total - control.deaths)),
sigma = sqrt(sigma2),
n = treated.total + control.total
) %>%
select(-sigma2)
study | y | sigma | n |
---|---|---|---|
1 | 0.0281709 | 0.8503034 | 77 |
2 | -0.7410032 | 0.4831516 | 230 |
3 | -0.5406212 | 0.5645611 | 162 |
4 | -0.2461281 | 0.1381833 | 3053 |
5 | 0.0694534 | 0.2806564 | 720 |
6 | -0.5841569 | 0.6757127 | 111 |
7 | -0.5123855 | 0.1386878 | 1884 |
8 | -0.0786233 | 0.2039910 | 1103 |
9 | -0.4241734 | 0.2739730 | 560 |
10 | -0.3348234 | 0.1170683 | 3837 |
11 | -0.2133975 | 0.1948720 | 1456 |
12 | -0.0389084 | 0.2294606 | 529 |
13 | -0.5932537 | 0.4251674 | 584 |
14 | 0.2815459 | 0.2054455 | 1741 |
15 | -0.3213336 | 0.2977091 | 301 |
16 | -0.1353479 | 0.2609219 | 420 |
17 | 0.1406065 | 0.3641742 | 373 |
18 | 0.3220497 | 0.5526449 | 305 |
19 | 0.4443805 | 0.7166491 | 308 |
20 | -0.2175097 | 0.2598417 | 427 |
21 | -0.5910760 | 0.2572069 | 755 |
22 | -0.6080991 | 0.2723787 | 1354 |
We’ll use the model described in the book. Note that by not explicitly giving a prior for \(\mu\) or \(\tau\), stan gives them a uniform prior.
model <- rstan::stan_model('src/ex_05_15.stan')
S4 class stanmodel 'ex_05_15' coded as follows:
data {
int<lower = 1> n;
int<lower = 1> study[n];
vector[n] y;
vector<lower = 0>[n] sigma;
}
parameters {
real mu;
real<lower = 0> tau;
vector[n] theta;
}
model {
theta ~ normal(mu, tau);
y ~ normal(theta, sigma);
}
Let’s fit the model.
set.seed(57197)
fit <- model %>%
sampling(
data = tidybayes::compose_data(df),
warmup = 1000,
iter = 5000
)
We’ll draw the posterior population parameters separately from the study parameters purely for convenience.
pop_params <- fit %>%
tidybayes::spread_draws(mu, tau)
draws <- fit %>%
tidybayes::spread_draws(mu, tau, theta[study])
The population standard deviation \(\tau\) has most of its mass below 0.4.
As in figure 5.6, the effect estimates are almost identical for \(\tau \approx 0\) and spread out as \(\tau\) increases.
Let’s get the median effect estimates with 95% posterior intervals. The low sample size estimates (dark dots) remain close to the population mean (\(\mu\)), whereas the larger sample size estimates (light dots) can move closer to the unpooled estimates (dotted line).
cis <- draws %>%
median_qi() %>%
inner_join(df, by = 'study')
To estimate an effect for a new study, we draw \(\theta \sim \dnorm(\mu, \tau)\).
new_theta <- pop_params %>%
mutate(theta = rnorm(n(), mu, tau))
prob_new_theta_positive <- new_theta %>%
summarise(sum(theta > 0) / n()) %>%
pull()
It has a 7.66% probability of being positive.