CIS Primer Question 3.2.1

Here are my solutions to question 3.2.1 of Causal Inference in Statistics: a Primer (CISP).

Here are the parameters we’ll use. Note that they are taken from the Simpson’s revesal example of question 1.5.2.

r   <- 0.28   # fraction with syndrome

q0 <- 0.07  # P(X = 1 | Z = 0)
q1 <- 0.85  # P(X = 1 | Z = 1)
             
p00 <- 0.84 # P(Y = 1 | X = 0, Z = 0)
p10 <- 0.88 # P(Y = 1 | X = 1, Z = 0)
p01 <- 0.53 # P(Y = 1 | X = 0, Z = 1)
p11 <- 0.58 # P(Y = 1 | X = 1, Z = 1)

Part a

We can simulate the intervention by generating values for \(X\) independently of \(Z\).

N <- 10000  # number of individuals

set.seed(53201)

part_a <- tibble(z = rbinom(N, 1, r)) %>% 
  mutate(
    x = rbinom(n(), 1, 0.5), # no Z-dependence
    p_y_given_x_z = case_when(
      x == 0 & z == 0 ~ p00,
      x == 0 & z == 1 ~ p01,
      x == 1 & z == 0 ~ p10,
      x == 1 & z == 1 ~ p11
    ),
    y = rbinom(n(), 1, p_y_given_x_z)
  ) %>% 
  group_by(x, y) %>% 
  summarise(n = n()) %>% 
  group_by(x) %>% 
  mutate(p_y_given_do_x = n / sum(n))
x y n p_y_given_do_x
0 0 1238 0.2470565
0 1 3773 0.7529435
1 0 964 0.1932251
1 1 4025 0.8067749

Part b

To simulate observational data, we need to include the dependence of \(X\) on \(Z\).

N <- 100000  # number of individuals

set.seed(95400)

p_x_y_z <- tibble(
    id = 1:N,
    
    z = rbinom(N, 1, r),
    x = rbinom(N, 1, if_else(z == 0, q0, q1)),
    
    p_y_given_x_z = case_when(
      x == 0 & z == 0 ~ p00,
      x == 0 & z == 1 ~ p01,
      x == 1 & z == 0 ~ p10,
      x == 1 & z == 1 ~ p11
    ),
    
    y = rbinom(N, 1, p_y_given_x_z)
  ) %>% 
  group_by(x, y, z) %>% 
  count() %>% 
  ungroup() %>% 
  mutate(p = n / sum(n))
x y z n p
0 0 0 10723 0.10723
0 0 1 2020 0.02020
0 1 0 56015 0.56015
0 1 1 2205 0.02205
1 0 0 624 0.00624
1 0 1 10067 0.10067
1 1 0 4470 0.04470
1 1 1 13876 0.13876

In order to apply the causal effect rule, we’ll need \(\mathbb P(x \mid z)\).

p_x_given_z <- p_x_y_z %>% 
  group_by(x, z) %>% 
  summarise(n = sum(n)) %>% 
  group_by(z) %>% 
  mutate(p = n / sum(n)) %>% 
  ungroup()
x z n p
0 0 66738 0.9290845
0 1 4225 0.1499929
1 0 5094 0.0709155
1 1 23943 0.8500071

We can then add the conditional probabilities to the joint distribution table, then sum overal all the \(Z\) variables.

p_y_given_do_x <- p_x_y_z %>% 
  inner_join(
    p_x_given_z, 
    by = c('x', 'z'), 
    suffix = c('_num', '_denom')
  ) %>% 
  mutate(p = p_num / p_denom) %>% 
  group_by(x, y) %>%
  summarise(p = sum(p)) 
x y p
0 0 0.2500877
0 1 0.7499123
1 0 0.2064264
1 1 0.7935736

The causal effect estimates are very close to the simulated intervention.

Part c

We can calculate ACE simply by taking the difference of the causal effect estimates.

ace <- p_y_given_do_x %>% 
  spread(x, p) %>% 
  filter(y == 1) %>% 
  mutate(ace = `1` - `0`) %>% 
  pull(ace)

ace
[1] 0.04366134

This is different from the overall probability differences.

p_y_given_x <- p_x_y_z %>% 
  group_by(x, y) %>% 
  summarise(n = sum(n)) %>% 
  group_by(x) %>% 
  mutate(p = n / sum(n)) %>% 
  select(-n) 

risk_difference <- p_y_given_x %>% 
  spread(x, p) %>% 
  filter(y == 1) %>% 
  mutate(rd = `1` - `0`) %>% 
  pull(rd)

risk_difference
[1] -0.188613

Making \(X\) independent of \(Z\) would minimise the disrepancy between ACE and RD, which would turn the adjustment formula into the formulat for \(\mathbb P(y \mid x\). In other words, setting \(q_0 = q_1 = \mathbb P(X = 1)\) would do the trick.

Part d

Note that the desegregated causal effects

  • \(p_{1, 0} - p_{0, 0}\) is 0.04; and
  • \(p_{1, 1} - p_{0, 1}\) is 0.05,

are both consisent with our calculation for the overall causal effect, ACE = 4.37%. The generated data are an illustration of Simpson’s reversal because the risk difference, -18.9%, has the opposite sign.