BDA3 Chapter 5 Exercise 14
Here’s my solution to exercise 14, chapter 5, of Gelman’s Bayesian Data Analysis (BDA), 3rd edition. There are solutions to some of the exercises on the book’s webpage.
We’ll use the same dataset as before but only use the total traffic counts.
df <- read_csv('data/chapter_03_exercise_08.csv') %>%
filter(type == 'residential' & bike_route) %>%
transmute(
i = 1:n(),
total = bikes + other
)
i | total |
---|---|
1 | 74 |
2 | 99 |
3 | 58 |
4 | 70 |
5 | 122 |
6 | 77 |
7 | 104 |
8 | 129 |
9 | 308 |
10 | 119 |
The hyperprior from exercise 13 was given by \(p(\alpha, \beta) \propto (\alpha, \beta)^{-\frac{5}{2}}\), but where \(\alpha, \beta\) were used as parameters in the beta distribution. As a first attempt, we’ll try using the same priors for our gamma distribution. Since the support is the same for each, this at least makes some sense.
The joint posterior is
\[ \begin{align} p(\alpha, \beta, \theta \mid y) &= p(\alpha, \beta) \cdot \prod_{j = 1}^J p(\theta_j \mid \alpha, \beta) \cdot p(y_j \mid \theta_j) \\ &= (\alpha + \beta)^{-\frac{5}{2}} \prod_{j = 1}^J \frac{\beta^\alpha}{\Gamma(\alpha)} \theta_j^{\alpha - 1} e^{-\beta \theta_j} \theta_j^{y_j} e^{-\theta_j} \\ &= (\alpha + \beta)^{-\frac{5}{2}} \frac{\beta^{J\alpha}}{\Gamma(\alpha)^J} e^{-(\beta + 1) \sum \theta_j} \prod_{j = 1}^J \theta_j^{y_j + \alpha - 1} . \end{align} \]
I have no idea if it has a finite integral, so we’ll just use this for the rest of the exercise.
Here’s the model definition in stan.
model <- rstan::stan_model('src/ex_05_14.stan')
S4 class stanmodel 'ex_05_14' coded as follows:
data {
int<lower = 1> n;
int<lower = 0> total[n];
}
parameters {
real<lower = 0> alpha;
real<lower = 0> beta;
vector<lower = 0>[n] theta;
}
model {
// hyperprior
target += -(5. / 2.) * log(alpha + beta);
// theta prior
theta ~ gamma(alpha, beta);
// likelihood
total ~ poisson(theta);
}
The model can be fit using some tidybayes helpers.
fit <- model %>%
rstan::sampling(
data = tidybayes::compose_data(df),
warmup = 1000,
iter = 5000
) %>%
tidybayes::recover_types(df)
Now draw samples from the posterior.
draws <- fit %>%
tidybayes::spread_draws(alpha, beta, theta[i])
The posterior joint distribution of \(\alpha, \beta\) looks fairly reasonable. It’s concentrated along the diagonal where \(\beta \approx \alpha / 100\), and mainly around \(\alpha \approx 2.5\), \(\beta \approx 0.025\).
There is little deviation between the observed and estimated values.
To estimate total traffic for an unobserved street, we draw \(\alpha, \beta\) from the posterior, draw \(\theta \sim \dgamma(\alpha, \beta)\), then draw \(\tilde y \sim \dpois(\theta)\). The quantiles of \(\tilde y\) are then our posterior predictive interval.
cis <- draws %>%
filter(i == 1) %>%
mutate(
theta = rgamma(n(), alpha, beta),
y = rpois(n(), theta)
) %>%
tidybayes::median_qi() %>%
select(matches('y|theta'))
The 95% posterior interval for \(\tilde\theta\) is (19, 295). The 95% posterior interval of \(\tilde y\) is (19, 298), which includes 9 of the 10 observed values.