Graphical causal inference

Jeffrey Leek
Johns Hopkins Bloomberg School of Public Health

Pro tip

In academia your goal is to "get famous". Not Bieber famous. Famous in the way that lots of people know about and respect your ideas/creativity/independence. A frustrating, but important aspect of this is that if you have a good idea you need to write about it a lot to make sure people remember it was you that came up with it.

Paper of the day

Today's slide credits

Basic idea

  • When can we use experimental data to tell us about the effect of an intervention
  • Basically when can we get \(Pr(Y | set(X) = x, Z=z)\) from \(Pr(Y | X=x,Z=z)\)
  • Association is symmetric $ X \not\perp Y \Leftrightarrow Y \not\perp X$
  • Causation is asymmetric \(X \rightarrow Y \not\Leftrightarrow Y \rightarrow X\)

Direct causation

\(X\) is a direct cause to \(Y\) relative to \(S\) iff

\[P(Y | set(X)=x1,set(Z=z)) \neq P(Y | set(X)=x2,set(Z=z))\]

Need to know all "confounders"

Can do this with intervention

But be careful of "fat hand" interventions that don't directly act on the thing you care about

Structural equation models

Connecting probability to causal structure

  1. There is a directed acyclic graph
  2. The Causal Markov condition: The joint distribution of the variables obeys the Markov property on G.
    • Immediate causes make things independent of remote causes
    • Common causes make their effects independent
  3. Faithfulness: The joint distribution has all of the conditional independence relations implied by the causal Markov property, and only those conditional independence relations

Causal Markov property

D-separation

If two variables are independent (don't have a connected, non-coliding line) conditional on a third.

  • In SEMs you get this from assuming error terms are independent if they don't have a path connectino
  • In acyclic graphs: d-separation is equivalent to Causal Markov
  • In Cyclic SEMs with uncorrelated errors
    • D-separation is still correct
    • Markov condition is incorrect
  • In Cyclic discrete variable graphs
    • If equilibrium then d-separation is correct
    • Markov is incorrect

Why we want d-separation

How these algorithms work

Equivalence classes implied by independence assumptions

Partial ancestral graphs

Partial ancestral graphs

Create equivalence classes from data

The "eliminate graphs" approach

The "sensitivity" approach

  1. Estimate equivalence class of possible graphs
  2. Use DAGs/Causal Markov to eliminate potential models
  3. Fit each "structural equation model"
  4. Use property of estimated model fits across models (max,min,etc.) to estimate graphical structure

Interesting application

R software