This class will cover foundational elements of the design-based theory causal inference which is based on the potential outcomes model. It will also discuss in detail the most common designs: regression discontinuity, instrumental variables, difference in differences, comparative case studies using synthetic control and if time permitting matching. It will be accompanied by efforts to introduce students to basic practices in programming as well as good research practices more generally.
Hidden Curriculum
About
While the field of causal inference is a design and model based approach to estimating causal effects, in practice a worker attempting to estimate causal effects will be working with data, sometimes very large data, using their personal computer, often cloud-based storage, directories where things are stored and scripting files implementing tasks (including but not exclusively estimation itself) in some chosen programming languages. Thus while you can teach theories about causal inference as distinct from empirical workflow, students should be trained in both as you cannot in practice have one and not the other. Here I discuss my own personal beliefs about empirical workflow, going through such things as missingness in data, hierarchy of directories, version control and more.
Slides
Potential Outcomes
About
The modern theory of causality is based on a seemingly simple idea called the "counterfactual". The counterfactual is an unusual element in the arsenal of modern statistics because counterfactuals are really just thought experiments and stories we tell about alternative realities that may or may not exist depending on whether we think the philosopher David Lewis was right. But regardless, the thought experiments usually involve considering what would have happened had a single choice gone a different way. This type of reasoning was formalized in the 1920s both conceptually and with a type of notation that has persisted to this data (Neyman 1923) and is now sometimes called "potential outcomes". Potential outcomes is a theory of causality, a formalized modeling of causality that comes with it a complete set of terms and concepts, and without understanding it, you will not be able to make much progress in understanding research designs as all contemporary research designs and increasingly the econometric estimators themselves are built on top of the potential outcomes framework. The purpose of this lecture is to learn that language and the formalized concepts (e.g., treatment effects) used for causal parameters. I also cover randomization, selection bias and randomization inference in this lecture.
Slides
Code
- Stata: ri.do, tea.do, thornton_ri.do
- R: Potential outcomes
- python: Potential outcomes
Readings
Mixtape chapter 4 Potential Outcomes Causal Model Software: Daggity
Directed Acyclic Graphs
About
Model-based approaches to identification can be sometimes better seen using causal graphs called directed acyclic graphs (DAGs). These modeling approaches are compatible with the design-based approach, but tend to emphasize a priori domain knowledge as opposed to treatment manipulation exclusively. Here we discuss the backdoor criterion, the frontdoor criterion, and collider bias.
Slides
Code
- Stata: moviestar.do, collider_discrimination.do
- R: DAGs
- python: DAGs
Readings
Mixtape chapter 3 Directed Acyclic Graphs
Sharp Regression Discontinuity
About One of the most desired quasi-experimental designs -- desired because it is viewed as highly credible despite being based on observational data -- is the regression discontinuity design. Here I will discuss the sharp RDD in great detail, going through identification, estimation, specification tests and tips, as well as a replication.
Slides RDD slides
Code
Readings Mixtape chapter 6: Regression discontinuity
Instrumental Variables
About ...
Slides ...
Code ...
Readings ...
Difference-in-Differences
About ...
Slides ...
Code ...
Readings ...
Synthetic Control
About ...
Slides ...
Code ...
Readings ...
Matching
About ...
Slides ...
Code ...
Readings ...