Causal inference

Jeffrey Leek
Johns Hopkins Bloomberg School of Public Health

Pro tip

Learn to say no politely. The primary source of motivation in academics (and in many other places) is guilt. To prevent total insanity you need to:

  • Say no to most things and focus
  • Do it in a way that doesn't offend people
  • Do it in a way that doesn't make you feel guilty

This is an art and requires years to master. Some never do.

Paper of the day

Today's slide credits

Recall levels of analysis

In approximate order of difficulty

  • Descriptive
  • Exploratory
  • Inferential
  • Predictive
  • Causal
  • Mechanistic

About causal analysis

Goal: To find out what happens to one variable when you make another variable change.

  • Usually randomized studies are required to identify causation
  • There are approaches to inferring causation in non-randomized studies, but they are complicated and sensitive to assumptions
  • Causal relationships are usually identified as average effects, but may not apply to every individual
  • Causal models are usually the "gold standard" for data analysis

Causal analysis

About mechanistic analysis

Goal: Understand the exact changes in variables that lead to changes in other variables for individual objects.

  • Incredibly hard to infer, except in simple situations
  • Usually modeled by a deterministic set of equations (physical/engineering science)
  • Generally the random component of the data is measurement error
  • If the equations are known but the parameters are not, they may be inferred with data analysis

Mechanistic analysis

Illustrating the main problem: shoe size and literacy

Most well known, but least understood idea

An outsider's perspective on causal inference

  1. With randomized trials -> straightforwardish (we'll see why in a minute)
  2. With observational data -> much, much harder
    • Main problem is confounding
    • The key idea are the assumptions
  3. It is all about missing data
    • Missing confounders
    • Missing counterfactual observations
  4. Causal inference is not easily described in standard statistical/probabilistic notation.

This is one of the hottest topics

This is one of the hottest topics

Representing causal relatinships

  • Rubin model:
    • \(Y(0), Y(1)\) or \(Y_x(u)\)
    • Outcome Y, had the individual, u, received treatment 0,1 or X.
  • Pearl model:
    • \(Pr(Y = y | do(X=x))\)
    • Distribution when setting X=x
  • Graphical models

What do those arrows mean?

  • Intuitively
    • an arrow \(S \rightarrow L\) means S "influences" L
  • Bayes Networks/Graphical models (requires DAG)
    • Conditional probabilities e.g. \(Pr(S,L,A) = Pr(A)Pr(S|A)Pr(L|A)\)
  • Causal networks (requires DAG)
    • Variables connected on an unblocked path d-separated
    • Basically as long as two arrows don't run into each other on the path collider)
    • For all distributions represented by the graph if sets of vertices X and Y are d-separated by a set Z in the DAG G, then X and Y are independent conditional on Z in P (this is complicated!)

Remember: these variables are causally related

x = rnorm(100); y = rnorm(100,mean=0.1*x)
plot(x,y,pch=19)
plot of chunk unnamed-chunk-1

Causal inference the Rubin way

Key concepts:

  • Treatments
    • The intervention we could apply or withhold
    • Think heavy drug use versus light drug use
    • Difficult with things you can't assign
  • Units
    • Who you apply the treatment to
    • Could be genes, people, schools, etc.
  • Potential outcomes
    • What would have happend in both treatment scenarios
    • Typically denoted \(Y(0), Y(1)\)

Slides on this framework via:

https://github.com/jtleek/jhsph753and4/blob/master/lectures/stuartcausal2.pdf

What we'd like

Effect of heavy adolescent drug use (T) on earnings at age 40:

What we get

Causal inference as missing data problem:

We estiamte average effects

  • Can’t estimate individual-level causal effects
  • ATE: average treatement effect
    • \(ATE = \frac{1}{N} \sum_{i=1}^N Y_i(1) - Y_i(0)\)
    • effect of drug use on everyone, if forced everyone to use drugs
  • ATT: average treatement effect among treated
    • \(ATT = \frac{1}{N} \sum_{i=1}^N (Y_i(1) - Y_i(0) | T_i = 1)\)
    • effect of drug use on people who actually use drugs

Randomized experiments as ideal

  • In a randomized experiment, units randomly assigned to treatment or control groups
  • Mathematically, this means that average of control group outcomes an unbiased estimate of average outcome under control for whole population (and same for the treatment group)
    • \(E(\bar{y}_{T_i = 0}) = \bar{Y(0)}\), \(E(\bar{y}_{T_i = 1}) = \bar{Y(1)}\)
    • So \(E(\bar{y}_{T_i = 0} - \bar{y}_{T_i = 1}) = \bar{Y(0)} - \bar{Y(1)}\)
  • Intuition is randomization ensures "balance" of covariates

Randomization is best but may cause problems

  • People don't do what they're told
    • Noncompliance
  • Randomization isn't always feasible
    • Random mating in human populations
  • Might have to wait a long time to see outcome
  • Randomization may not estimate effects for the group we care about

Ways of performing causal inference outside of experiments

  • Stratification/regression

    • Put people in groups with same covariates
    • But some covariates may be missing
    • Limitations of sample size if many covariates aren't there
  • Instrumental variables

    • Find an instrument that affects the treatment of real interest, but does not directly affect the outcomes
    • Vietnam draft lottery as instrument for military service
    • physician prescribing preferences as instrument for taking drug A vs. drug B
    • Need a good instrument
    • Implies a set of other assumptions (monotonicity, exclusion restrictions, etc.)

Propensity scores

  • Propensity scores
    • Try to replicate features of randomized experiments (create groups that look only randomly different, don't use the outcome when setting up the design)
    • Basically trying to "balance" the data set

Ideal matching scenario -exact match

  • For each treated find a control with exact same covariates
  • Easy with 1 covariate but harder with more
  • Software for doing this:
  • Want similar covariate distributions in treatment/control after matching
  • One diagnostic is "standardization bias"
    • Difference in means, divided by standard deviation
    • Like an effect size for covariates
    • Should hopefully be small

Example diagnostic plot

Propensity scores

\[e_i = P(T_i = 1 | X_i)\]

  • Key features
    • Balancing score - at each value the distribution is the same in the treated/control groups.
    • Intuitvely people with similar probabilities should be assigned similar treatments other than randomness.
    • If treatment is independent of potential outcomes given covariates, also independent of ptoential outcomes given propensity score (no unmeasured confounders)
  • Can be used to match, instead of on covariates individually
  • Central goal is to achieve balance

Unconfoundedness assumption

  • Assumes that there are no unobserved differences between the treatment and control groups, given the observed variables
    • Other ways of saying essentially the same thing: No unobservedconfounders, no hidden bias, “ignorable”
    • Could be a problem if, e.g., people start smoking marijuana because they are getting bad grades and we don’t have grades measured
    • Can help make unconfoundedness assumption more realistic if think about it during data collection
    • Can also do sensitivity analyses to assess how sensitive results are toviolation of this assumption

If assumption holds we get the nice property

\[P(T | X, Y(0), Y(1)) = P(T | X) \implies P(T |X, Y(0),Y(1)) = P(T|e(X))\]

  • This is what allows us to match just on propensity score; don’t need to deal with all the covariates individually

Types of matching

  • k-nearest neighbors
    • For each treated unit select k similar controls
  • Stratification/binning
    • Group individuals with similar propensity scores
  • Weighting adjustments
    • Inverse probability of exposure weights (IPTW)

Double robustness

  • You can combine propensity score with regression adjustment
    • regression adjustment in 1:1 matched samples
    • weighted regression adjustment
  • The two methods work together and if you get one or the other right you are "ok"
  • This is an active area of research

Instrumental vairables: another approach

  • Sometimes we can't randomize the thing we want
    • Gene expression levels
    • Taking a drug
  • But we can randomize something related
    • Genetic background
    • Treatment assignment
  • The thing we can randomize is "variable"
  • Randomization is the "instrument"
  • Good instruments
    • Easy to randomize
    • Closely related to the variable you care about

Instrumental variables: yeast example

Randomize with meiosis

Is it random?

How do we use this?

  • Try to infer connections between genes (or genes and phenotype)
  • Commonly called "Mendelian randomization"
  • Hard to do in humans because instrument (randomization) isn't great: population structure for example

Another example: Sommer and Zeger

  • Study estimating the effect of Vitamin A on child mortality
  • Carried out in Indonesia
  • Villages randomized to receive vitamin A supplements or not
  • Not all children in Vitamin A villages actually got Vitamin A
  • No children in control villages got Vitamin A (not available)
  • For simplicity, will ignore village clustering and treat as if children randomly assigned (village indicators not available anyway)

Compliance and non-compliance

  • Can think of there being two types of people:
    • Compliers: take the treatment if assigned to, don’t take it if not assigned to
    • Non-compliers: don’t take the treatment either way
  • Observe compliance status in treatment group
  • Don’t observe compliance status in control group
  • We don’t know what the controls would have done if they had been in the treatment group

The data

How to deal with noncompliance

  • Intent to treat (ITT) - ignore compliance
    • Gives unbiased estimates of causal effect of being assigned to group
    • Not what we care about
  • As treated - compare people who get treatment who get control
    • Ignores randomization
    • Not a valid estimate of causal effect
  • Per protocol analysis
    • Compares people who appeared to comply with their assigned treatment
    • Not a valid estimate of a causal effect

Intent to treat

As treated

Per compliance

How do we do this with instrumental variables

  • What we really want: the biologic effect of taking Vitman A on Mortality
  • We can use instrumental variables to get at this
  • Define the average effect among compliers (CACE) and noncompliers (NACE)
  • Then we have that \[ACE = p_c \times CACE + (1-p_c) \times NACE\]
  • Here we change what we are interested in to be "complier average causal effect"

We can estimate the CACE

  • ITT provides unbiased estimate of the total causal effect
  • We can estimate the proportions of compliers (missing data problem again!)
    • \(p_c\) is proportion who took treatment (in Zeger/Sommer data 0.8)
  • Now we have an estimation problem with two unknowns \[ACE = p_c \times CACE + (1-p_c) \times NACE\] \[-0.0025 = 0.8 \times CACE + 0.2 \times NACE\]
  • Assume that being assigned to treatment doesn't effect outcome if you don't take it (NACE =0)
  • You can now solve \(CACE = -0.0025/0.8 = -0.0031\)
  • This is the IV estimate for CACE

Can extend this idea to 2-sided

  • Suppose that the control group has access to treatment
    • Flu shots - encouragement
    • Diet - information
  • Now we have three kinds of "non-compliers"
    • Defiers
    • Always takers
    • Never takers
  • Main interest is still the CACE but things get more complicated
  • Fundamental idea still the same: \[ ACE = p_c \times CACE + p_d \times DACE + p_{at} \times AACE + p_{nt} \times NACE\]

Three conditions for an instrument (Hernan and Robins 2006)

  • Has a causal effect on the treatment of interest (\(p_c > 0\))
    • (there exist some compliers)
  • Affects the outcome only through the treatment received
    • (exclusion restrictions)
  • Does not share common causes with the outcome
    • instrument (encouragement) assigned randomly

Flu example

How do we estimate CACE in the more complicated case

\[ ACE = p_c \times CACE + p_d \times DACE + p_{at} \times AACE + p_{nt} \times NACE\]

  • No defiers \(\implies p_d = 0\)
  • Exclusion restrictions \(AACE = NACE = 0\)
  • People in the treatment goroup must be never takers \(\implies p_{nt} = 1-0.307 = 0.693\)
  • People in the control group who get the flu shot must be always takers \(\implies p_{at} = 0.19\)
  • The remaining people must be compliers \(p_c = 1 - 0.693 - 0.19 = 0.12\)
  • \(ITT\) estimates \(ACE\): \(ITT = -0.014\)
  • \(ACE = p_c * CACE \implies CACE = ACE/p_c \implies CACE = -0.014/0.12\)

If you liked this learn from people who actually know stuff