Experimental design

Jeffrey Leek
Johns Hopkins Bloomberg School of Public Health

Why you should care - an exciting result!

Why you should care - uh oh!

Why you should care - serious trouble

Know and care about the analysis plan!

Have a plan for data and code sharing

May I recommend?

Formulate your question in advance

Statistical inference

Variability - Scenario 1

plot of chunk unnamed-chunk-1

Variability - Scenario 2

plot of chunk unnamed-chunk-2

Variability - Scenario 3

plot of chunk unnamed-chunk-3

Confounding

Correlation is not causation*

Randomization and blocking

  • If you can (and want to) fix a variable
    • Website always says Obama 2014 on it
  • If you don't fix a variable, stratify it
    • If you are testing sign up phrases and have two website colors, use both phrases equally on both.
  • If you can't fix a variable, randomize it

Why does randomization help?

Prediction

Prediction versus inference

Prediction key quantities

Beware data dredging

Beware data dredging

Beware data dredging

Summary

  • Good experiments
    • Have replication
    • Meaure variability
    • Generalize to the problem you care about
    • Are transparent
  • Prediction is not inference
    • Both can be important
  • Beware data dredging