BDA3 Chapter 3 Exercise 8
Here’s my solution to exercise 8, chapter 3, of Gelman’s Bayesian Data Analysis (BDA), 3rd edition. There are solutions to some of the exercises on the book’s webpage.
You can download the full dataset shown in table 3.3. Let’s load it into a dataframe and select just the residential data, as suggested.
df0 <- read_csv('data/chapter_03_exercise_08.csv') %>%
mutate(
type = as_factor(
type,
levels = c('residential', 'fairly_busy', 'busy'),
ordered = TRUE
),
bikes = as.integer(bikes),
other = as.integer(other)
)
df <- df0 %>%
filter(type == 'residential') %>%
mutate(
total = bikes + other,
bike_fraction = bikes / total,
other_fraction = other / total
)
Here are the first few rows with each value of bike_route
.
type | bike_route | bikes | other | total | bike_fraction | other_fraction |
---|---|---|---|---|---|---|
residential | TRUE | 16 | 58 | 74 | 0.2162162 | 0.7837838 |
residential | TRUE | 9 | 90 | 99 | 0.0909091 | 0.9090909 |
residential | TRUE | 10 | 48 | 58 | 0.1724138 | 0.8275862 |
residential | FALSE | 12 | 113 | 125 | 0.0960000 | 0.9040000 |
residential | FALSE | 1 | 18 | 19 | 0.0526316 | 0.9473684 |
residential | FALSE | 2 | 14 | 16 | 0.1250000 | 0.8750000 |
We’ll use an uninformative gamma prior with a Poisson likelihood for the counts. The posterior can then be calculated as follows.
draws <- 10000
shape_prior <- 2
rate_prior <- 0
posterior <- function(data, draws = 10000) {
bikes <- data %>% pull(bikes)
other <- data %>% pull(other)
n <- data %>% pull(n)
tibble(draw = 1:draws) %>%
mutate(
theta_bike = rgamma(draws, bikes, n),
theta_other = rgamma(draws, other, n),
mu = rpois(draws, theta_bike),
p = theta_bike / (theta_bike + theta_other)
)
}
posterior_draws <- df %>%
group_by(bike_route) %>%
summarise(
bikes = sum(bikes) + shape_prior,
other = sum(other) + shape_prior,
n = n() + rate_prior
) %>%
nest(-bike_route) %>%
mutate(draws = map(data, posterior, draws)) %>%
unnest(draws)
Plotting posterior predictive draws of \(\theta_y\) and \(\theta_z\), we can see that there seems to be quite a difference.
posterior_draws %>%
ggplot() +
aes(mu, fill = bike_route) +
geom_bar(position = 'identity', alpha = 0.75) +
labs(
x = 'Bike count',
y = 'Count',
fill = 'Has bike route?',
title = 'Posterior expectation of bike count'
)
To quantify this difference, we’ll have to match up our posterior draws for \(\theta_y\) and \(\theta_z\).
difference <- posterior_draws %>%
select(draw, bike_route, mu) %>%
spread(bike_route, mu) %>%
mutate(difference = `TRUE` - `FALSE`)
The difference \(\mu_y - \mu_z\) has the following 95% credible interval:
difference %>%
pull(difference) %>%
quantile(probs = c(0.05, 0.5, 0.95))
5% 50% 95%
6 15 24