CIS Primer Question 3.4.1

Posted on 15 February, 2019 by Brian
Tags: CISP chapter 3, solutions, front door criteria, front door adjustment
Category: causal_inference_in_statistics_primer

Here are my solutions to question 3.4.1 of Causal Inference in Statistics: a Primer (CISP).

If we can only measure one additional variable to estimate the causal effect of on in figure 3.8, then we should measure . From question 3.3.1 we see that no single variable satisfies the backdoor criteria. Moreover, visual inspection of the graph verifies that satisfies the frontdoor criteria:

it intercepts all (the only) directed paths from to ;
there is no unblocked path from to ; and
all backdoor paths from to are blocked by .

To illustrate this, lets simulate the causal effect in 3 separate ways:

by intervention,
via the backdoor, and
via the frontdoor.

Here are the data. Note that we have created functions for and for use later.

N <- 100000

W <- function(x) {
  N <- length(x)
  rbinom(N, 1, inv_logit(-x))
}

Y <- function(d, w, z) {
  N <- length(d)
  rbinom(N, 1, inv_logit(-d - w + 3*z))
}

df <- tibble(id = 1:N) %>% 
  mutate(
    b = rnorm(N, 0, 1),
    a = b + rnorm(N, 0, 0.1),
    c = rnorm(N, 0, 1),
    d = rbinom(N, 1, inv_logit(-1 + c)),
    z = rbinom(N, 1, inv_logit(-2 + 2*b + c)),
    x = rbinom(N, 1, inv_logit(a + z)),
    w = W(x),
    y = Y(d, w, z)
  )

Simulated data for figure 3.8
id	b	a	c	d	z	x	w	y
1	0.3641297	0.3917626	1.0369530	1	1	0	0	1
2	0.0287563	0.0397299	0.5736271	0	0	1	1	0
3	-0.7727052	-0.5993870	-0.5179657	0	1	0	1	1
4	0.4107888	0.5737898	1.2586840	0	1	1	0	1
5	2.3512417	2.1631719	0.6746523	1	1	1	0	1

Intervention

In order to simulate an intervention, we assign values to randomly, then assign new values for all its descendents. After intervention, the causal effect of on is simply .

intervention <- df %>% 
  # intervene on x
  mutate(
    x = rbinom(n(), 1, 0.5),
    w = W(x),
    y = Y(d, w, z)
  ) %>% 
  # model P(y | do(x))
  glm(
    formula = y ~ x, 
    family = binomial(), 
    data = .
  ) %>% 
  # predict
  augment(
    newdata = tibble(x = 0:1), 
    type.predict = 'response'
  )

P(Y | do(X))
x	.fitted	.se.fit
0	0.4637566	0.0022239
1	0.5072714	0.0022422

We can compare this causal effect to the simple statistical effect to see the difference.

noncausal <- df %>% 
  # model P(y | x)
  glm(
    formula = y ~ x, 
    family = binomial(), 
    data = .
  ) %>% 
  # predict
  augment(
    newdata = tibble(x = 0:1), 
    type.predict = 'response'
  )

P(Y | X) ≠ P(Y | do(X))
x	.fitted	.se.fit
0	0.3797282	0.0022595
1	0.5759927	0.0021293

Backdoor

Since satisfies the backdoor criteria, we can use it to apply the backdoor adjustment. First we’ll need .

# P(d, z)
p_d_z <- df %>% 
  group_by(d, z) %>% 
  count() %>% 
  ungroup() %>%  
  mutate(p_d_z = n / sum(n))

Now we model , multiply it by , then take the sum for each value of .

backdoor <- formula(y ~ 1 + x + z + d) %>% 
  # model P(y | x, d, z)
  glm(
    family = binomial(),
    data = df
  ) %>%  
  # predict
  augment(
    type.predict = 'response',
    newdata = 
      crossing(
        d = c(0, 1),
        x = c(0, 1),
        z = c(0, 1)
      )
  ) %>% 
  # get P(d, z)
  mutate(p_y_given_d_x_z = .fitted) %>% 
  inner_join(p_d_z, by = c('d', 'z')) %>% 
  # backdoor adjustment over d, z
  group_by(x) %>% 
  summarise(p_y_given_do_x = sum(p_y_given_d_x_z * p_d_z))

Backdoor estimates for P(Y | do(X))
x	p_y_given_do_x
0	0.4681530
1	0.5033398

Note that the backdoor adjusted estimates are similar to the estimates from intervention.

Frontdoor

To apply the frontdoor adjustment with , we’ll need , , and .

p_w_given_x <- df %>% 
  group_by(x, w) %>% 
  count() %>% 
  group_by(x) %>% 
  mutate(p_w_given_x = n / sum(n)) %>% 
  ungroup()

p_xprime <- df %>% 
  group_by(xprime = x) %>% 
  count() %>% 
  ungroup() %>% 
  mutate(p_xprime = n / sum(n))

p_y_given_xprime_w <- formula(y ~ 1 + x + w) %>% 
  glm(
    family = binomial(),
    data = df
  ) %>% 
  augment(
    newdata = crossing(x = 0:1, w = 0:1),
    type.predict = 'response'
  ) %>% 
  transmute(
    xprime = x,
    w,
    p_y_given_xprime_w = .fitted
  )

Now we apply the frontdoor adjustment:

frontdoor <- p_w_given_x %>% 
  inner_join(p_y_given_xprime_w, by = 'w') %>% 
  inner_join(p_xprime, by = 'xprime') %>% 
  group_by(x) %>%
  summarise(sum(p_w_given_x * p_y_given_xprime_w * p_xprime))

Frontdoor estimates of P(Y | do(X))
x	sum(p_w_given_x * p_y_given_xprime_w * p_xprime)
0	0.4623710
1	0.5041105

Our frontdoor estimates of are very similar to the intervention and backdoor estimates.