BDA3 Chapter 3 Exercise 8

Here’s my solution to exercise 8, chapter 3, of Gelman’s Bayesian Data Analysis (BDA), 3rd edition. There are solutions to some of the exercises on the book’s webpage.

You can download the full dataset shown in table 3.3. Let’s load it into a dataframe and select just the residential data, as suggested.

df0 <- read_csv('data/chapter_03_exercise_08.csv') %>% 
  mutate(
    type = as_factor(
      type, 
      levels = c('residential', 'fairly_busy', 'busy'), 
      ordered = TRUE
    ),
    bikes = as.integer(bikes),
    other = as.integer(other)
  )

df <- df0 %>% 
  filter(type == 'residential') %>% 
  mutate(
    total = bikes + other,
    bike_fraction = bikes / total,
    other_fraction = other / total
  )

Here are the first few rows with each value of bike_route.

type bike_route bikes other total bike_fraction other_fraction
residential TRUE 16 58 74 0.2162162 0.7837838
residential TRUE 9 90 99 0.0909091 0.9090909
residential TRUE 10 48 58 0.1724138 0.8275862
residential FALSE 12 113 125 0.0960000 0.9040000
residential FALSE 1 18 19 0.0526316 0.9473684
residential FALSE 2 14 16 0.1250000 0.8750000

We’ll use an uninformative gamma prior with a Poisson likelihood for the counts. The posterior can then be calculated as follows.

draws <- 10000

shape_prior <- 2
rate_prior <- 0

posterior <- function(data, draws = 10000) {
  
  bikes <- data %>% pull(bikes)
  other <- data %>% pull(other)
  n <- data %>% pull(n)
  
  tibble(draw = 1:draws) %>%
    mutate(
      theta_bike = rgamma(draws, bikes, n),
      theta_other = rgamma(draws, other, n),
      mu = rpois(draws, theta_bike),
      p = theta_bike / (theta_bike + theta_other)
    )
  
}

posterior_draws <- df %>% 
  group_by(bike_route) %>% 
  summarise(
    bikes = sum(bikes) + shape_prior,
    other = sum(other) + shape_prior,
    n = n() + rate_prior
  ) %>% 
  nest(-bike_route) %>% 
  mutate(draws = map(data, posterior, draws)) %>% 
  unnest(draws)

Plotting posterior predictive draws of θy and θz, we can see that there seems to be quite a difference.

posterior_draws %>% 
  ggplot() +
  aes(mu, fill = bike_route) +
  geom_bar(position = 'identity', alpha = 0.75) +
  labs(
    x = 'Bike count',
    y = 'Count',
    fill = 'Has bike route?',
    title = 'Posterior expectation of bike count'
  )

To quantify this difference, we’ll have to match up our posterior draws for θy and θz.

difference <- posterior_draws %>% 
  select(draw, bike_route, mu) %>% 
  spread(bike_route, mu) %>% 
  mutate(difference = `TRUE` - `FALSE`) 

The difference μyμz has the following 95% credible interval:

difference %>% 
  pull(difference) %>% 
  quantile(probs = c(0.05, 0.5, 0.95))
 5% 50% 95% 
  6  15  24