CIS Primer Question 3.4.1
Here are my solutions to question 3.4.1 of Causal Inference in Statistics: a Primer (CISP).
If we can only measure one additional variable to estimate the causal effect of X on Y in figure 3.8, then we should measure W. From question 3.3.1 we see that no single variable satisfies the backdoor criteria. Moreover, visual inspection of the graph verifies that W satisfies the frontdoor criteria:
- it intercepts all (the only) directed paths from X to Y;
- there is no unblocked path from X to W; and
- all backdoor paths from W to Y are blocked by X.
To illustrate this, lets simulate the causal effect in 3 separate ways:
- by intervention,
- via the backdoor, and
- via the frontdoor.
Here are the data. Note that we have created functions for W and Y for use later.
N <- 100000
W <- function(x) {
N <- length(x)
rbinom(N, 1, inv_logit(-x))
}
Y <- function(d, w, z) {
N <- length(d)
rbinom(N, 1, inv_logit(-d - w + 3*z))
}
df <- tibble(id = 1:N) %>%
mutate(
b = rnorm(N, 0, 1),
a = b + rnorm(N, 0, 0.1),
c = rnorm(N, 0, 1),
d = rbinom(N, 1, inv_logit(-1 + c)),
z = rbinom(N, 1, inv_logit(-2 + 2*b + c)),
x = rbinom(N, 1, inv_logit(a + z)),
w = W(x),
y = Y(d, w, z)
)
id | b | a | c | d | z | x | w | y |
---|---|---|---|---|---|---|---|---|
1 | 0.3641297 | 0.3917626 | 1.0369530 | 1 | 1 | 0 | 0 | 1 |
2 | 0.0287563 | 0.0397299 | 0.5736271 | 0 | 0 | 1 | 1 | 0 |
3 | -0.7727052 | -0.5993870 | -0.5179657 | 0 | 1 | 0 | 1 | 1 |
4 | 0.4107888 | 0.5737898 | 1.2586840 | 0 | 1 | 1 | 0 | 1 |
5 | 2.3512417 | 2.1631719 | 0.6746523 | 1 | 1 | 1 | 0 | 1 |
Intervention
In order to simulate an intervention, we assign values to X randomly, then assign new values for all its descendents. After intervention, the causal effect of X on Y is simply P(Y∣X).
intervention <- df %>%
# intervene on x
mutate(
x = rbinom(n(), 1, 0.5),
w = W(x),
y = Y(d, w, z)
) %>%
# model P(y | do(x))
glm(
formula = y ~ x,
family = binomial(),
data = .
) %>%
# predict
augment(
newdata = tibble(x = 0:1),
type.predict = 'response'
)
x | .fitted | .se.fit |
---|---|---|
0 | 0.4637566 | 0.0022239 |
1 | 0.5072714 | 0.0022422 |
We can compare this causal effect to the simple statistical effect to see the difference.
noncausal <- df %>%
# model P(y | x)
glm(
formula = y ~ x,
family = binomial(),
data = .
) %>%
# predict
augment(
newdata = tibble(x = 0:1),
type.predict = 'response'
)
x | .fitted | .se.fit |
---|---|---|
0 | 0.3797282 | 0.0022595 |
1 | 0.5759927 | 0.0021293 |
Backdoor
Since {X,Z} satisfies the backdoor criteria, we can use it to apply the backdoor adjustment. First we’ll need P(D,Z).
# P(d, z)
p_d_z <- df %>%
group_by(d, z) %>%
count() %>%
ungroup() %>%
mutate(p_d_z = n / sum(n))
Now we model P(Y∣X,D,Z), multiply it by P(D,Z), then take the sum for each value of X.
backdoor <- formula(y ~ 1 + x + z + d) %>%
# model P(y | x, d, z)
glm(
family = binomial(),
data = df
) %>%
# predict
augment(
type.predict = 'response',
newdata =
crossing(
d = c(0, 1),
x = c(0, 1),
z = c(0, 1)
)
) %>%
# get P(d, z)
mutate(p_y_given_d_x_z = .fitted) %>%
inner_join(p_d_z, by = c('d', 'z')) %>%
# backdoor adjustment over d, z
group_by(x) %>%
summarise(p_y_given_do_x = sum(p_y_given_d_x_z * p_d_z))
x | p_y_given_do_x |
---|---|
0 | 0.4681530 |
1 | 0.5033398 |
Note that the backdoor adjusted estimates are similar to the estimates from intervention.
Frontdoor
To apply the frontdoor adjustment with W, we’ll need P(W∣X), P(X′), and P(Y∣X,W).
p_w_given_x <- df %>%
group_by(x, w) %>%
count() %>%
group_by(x) %>%
mutate(p_w_given_x = n / sum(n)) %>%
ungroup()
p_xprime <- df %>%
group_by(xprime = x) %>%
count() %>%
ungroup() %>%
mutate(p_xprime = n / sum(n))
p_y_given_xprime_w <- formula(y ~ 1 + x + w) %>%
glm(
family = binomial(),
data = df
) %>%
augment(
newdata = crossing(x = 0:1, w = 0:1),
type.predict = 'response'
) %>%
transmute(
xprime = x,
w,
p_y_given_xprime_w = .fitted
)
Now we apply the frontdoor adjustment:
P(Y∣do(X))=∑x′,wP(x′)⋅P(w∣x)⋅P(y∣x′,w).
frontdoor <- p_w_given_x %>%
inner_join(p_y_given_xprime_w, by = 'w') %>%
inner_join(p_xprime, by = 'xprime') %>%
group_by(x) %>%
summarise(sum(p_w_given_x * p_y_given_xprime_w * p_xprime))
x | sum(p_w_given_x * p_y_given_xprime_w * p_xprime) |
---|---|
0 | 0.4623710 |
1 | 0.5041105 |
Our frontdoor estimates of P(Y∣do(X)) are very similar to the intervention and backdoor estimates.