Chapter 9: The Causal Framework
Opening Question
What does it mean to say that X causes Y—and how can we ever know?
Chapter Overview
Causation is the central concern of empirical social science. We want to know whether education raises earnings, whether democracy promotes growth, whether a policy intervention improves outcomes. These are causal questions: they ask what would happen if we changed something.
Yet causation is philosophically subtle and empirically treacherous. Correlation is easy to observe; causation is not. The fundamental problem is that we can never directly observe what would have happened under the alternative—the counterfactual. Every causal claim, no matter how sophisticated the method, rests on assumptions about these unobserved counterfactuals.
This chapter develops the conceptual foundations for causal inference. We introduce two frameworks—potential outcomes and directed acyclic graphs—that formalize causal reasoning. We show how they relate, when they agree, and when they offer different insights. We then survey the identification strategies that the subsequent chapters will develop in detail. The goal is not to make causation easy, but to make it precise: to replace vague claims about "controlling for confounders" with explicit statements about what we assume and what we can learn.
What you will learn:
The potential outcomes framework and why counterfactuals are central to causation
How to define and distinguish ATE, ATT, and LATE
The graphical approach to causal inference using DAGs
How to identify causal effects using backdoor and frontdoor criteria
The taxonomy of identification strategies used in applied research
How qualitative methods employ counterfactual reasoning
Prerequisites: Chapter 1 (The Empirical Enterprise), Chapter 3 (Statistical Foundations)
9.1 The Fundamental Problem of Causal Inference
What Is Causation?
Consider a simple question: Did taking aspirin cure my headache?
To answer, I need to compare two states of the world:
The world where I took aspirin (what actually happened)
The world where I did not take aspirin (the counterfactual)
If my headache went away in world 1 but would have persisted in world 2, then aspirin caused the cure. If my headache would have gone away anyway, aspirin was not the cause—even if I feel better now.
This is the counterfactual definition of causation: X causes Y if Y would have been different had X been different, holding all else constant.
The Problem: Counterfactuals Are Unobservable
The fundamental problem is immediate: I cannot observe both worlds. I either took the aspirin or I did not. The counterfactual outcome—what would have happened under the alternative—is forever missing.
The Fundamental Problem of Causal Inference
For any individual unit, we can observe at most one potential outcome. The causal effect for that unit—the difference between what happened and what would have happened—is inherently unobservable.
This is not a data limitation that more observations can solve. It is a logical feature of causal questions about specific units. No amount of data tells me what my headache would have done without aspirin.
From Individual to Average Effects
If individual causal effects are unobservable, how do we make progress?
The key insight: while we cannot observe individual effects, we can sometimes estimate average effects across groups. If we randomly assign some people to take aspirin and others to take placebo, the average outcome in the aspirin group estimates what the average outcome would have been for the placebo group had they taken aspirin (and vice versa).
Randomization creates comparable groups. Comparison across groups substitutes for the impossible comparison within individuals.
But randomization is not always possible. The rest of this chapter—and much of this book—develops frameworks for thinking about when and how we can estimate causal effects without randomization.
9.2 The Potential Outcomes Framework
Setup and Notation
The potential outcomes framework, developed by Donald Rubin and often called the Rubin Causal Model, formalizes causal inference using the notation of potential outcomes.
For each unit i:
Di∈{0,1} denotes treatment status (1 = treated, 0 = control)
Yi(1) denotes the potential outcome if treated
Yi(0) denotes the potential outcome if not treated
The individual treatment effect is τi=Yi(1)−Yi(0)
The observed outcome is: Yi=Di⋅Yi(1)+(1−Di)⋅Yi(0)
We observe Yi(1) for treated units and Yi(0) for control units—never both.

Average Treatment Effects
Since individual effects τi are unobservable, we focus on averages:
Definition 9.1: Average Treatment Effect (ATE)
ATE=E[Y(1)−Y(0)]=E[Y(1)]−E[Y(0)]
The average causal effect across the entire population.
Definition 9.2: Average Treatment Effect on the Treated (ATT)
ATT=E[Y(1)−Y(0)∣D=1]
The average effect for those who actually received treatment.
Definition 9.3: Local Average Treatment Effect (LATE)
LATE=E[Y(1)−Y(0)∣Compliers]
The average effect for units whose treatment status is changed by an instrument. (See Chapter 12 for details.)
When do these differ? When treatment effects vary across individuals and treatment assignment is related to the effect size. If people who benefit most are most likely to be treated, ATT exceeds ATE. If an instrument affects only certain types of people, LATE may differ from both.
The Selection Problem
The naive approach to estimating ATE compares average outcomes: τ^naive=E[Y∣D=1]−E[Y∣D=0]
But this generally does not equal ATE. To see why, decompose it:
E[Y∣D=1]−E[Y∣D=0]=ATTE[Y(1)∣D=1]−E[Y(0)∣D=1]+Selection BiasE[Y(0)∣D=1]−E[Y(0)∣D=0]
The second term—selection bias—reflects differences between treated and control groups that would exist even without treatment. If people who select into treatment have higher Y(0), the naive comparison overstates the causal effect.

Identifying Assumptions
To estimate causal effects, we need assumptions that allow us to learn about unobserved potential outcomes. The key assumptions:
Assumption 9.1: Unconfoundedness (Ignorability)
(Y(0),Y(1))⊥D∣X
Conditional on observed covariates X, treatment assignment is independent of potential outcomes. All confounders are observed.
Assumption 9.2: Overlap (Common Support)
0<P(D=1∣X)<1
For every value of X, there is positive probability of being in both treatment and control.
Assumption 9.3: SUTVA (Stable Unit Treatment Value Assumption)
No interference: Yi depends only on Di, not on Dj for j=i
No hidden variations: Treatment D=1 means the same thing for everyone
Formally, let D=(D1,…,Dn) denote the treatment vector for all units. SUTVA requires: Yi(D)=Yi(Di) The potential outcome depends only on own treatment, not on the full treatment vector.
Why SUTVA matters: Without SUTVA, potential outcomes are not well-defined. If your outcome depends on whether your neighbor is treated, then Yi(1) is ambiguous—it could mean many different things depending on Dj.
Box: The Two Parts of SUTVA
Part 1: No Interference (No Spillovers)
My outcome depends only on my treatment, not on whether others are treated.
When it fails:
Vaccinations: My health depends on whether my neighbors are vaccinated (herd immunity)
Job training: If everyone in my labor market gets trained, my advantage disappears
Education: Peer effects mean my learning depends on classmates' treatment
Information: If treated units share knowledge with controls, control outcomes change
What to do when it fails: Design studies to measure spillovers directly; use clustered randomization; analyze at market/network level rather than individual level.
Part 2: No Hidden Treatment Variations (Treatment Homogeneity)
"Treatment" means the same thing for everyone.
When it fails:
Dosage varies: Some get 10mg, others get 50mg—both are "treated"
Quality varies: The same program implemented well vs. poorly
Timing varies: Early vs. late adoption may have different effects
Compliance varies: Intent-to-treat differs from actual treatment received
What to do when it fails: Define treatment more precisely; stratify by treatment type; use continuous treatment methods.
The deeper issue: SUTVA lets us write Yi(1) as a single number. Without it, we'd need Yi(1,D−i,v) where D−i is everyone else's treatment and v is the treatment version—making the framework intractable.

Under unconfoundedness and overlap, we can identify ATE by adjusting for X: ATE=EX[E[Y∣D=1,X]−E[Y∣D=0,X]]
Example: Returns to Education
Consider estimating the effect of a college degree on earnings.
Di=1 if person i has a college degree
Yi(1) = earnings with degree
Yi(0) = earnings without degree
τi=Yi(1)−Yi(0) = individual return to college
The naive comparison—average earnings of college graduates minus non-graduates—conflates the causal effect with selection. People who attend college may have higher earnings potential regardless of college (due to ability, family background, motivation).
The selection problem: E[Y(0)∣D=1]>E[Y(0)∣D=0]. Those who get degrees would have earned more even without them.
Solving this requires either:
Randomizing college attendance (generally infeasible)
Finding a source of exogenous variation (IV, Chapter 12)
Assuming all confounders are observed (Chapter 11)
Exploiting a natural experiment (Chapters 13-14)
9.3 Directed Acyclic Graphs (DAGs)
A Different Language for Causation
The potential outcomes framework asks: "What if treatment had been different?"
An alternative approach, developed by Judea Pearl and colleagues, represents causal relationships graphically using directed acyclic graphs (DAGs). This framework asks: "What variables must we adjust for to isolate the causal effect?"
The two frameworks are complementary. Potential outcomes clarify what we want to estimate. DAGs clarify what confounding means and how to address it.
DAG Basics
A DAG consists of:
Nodes: Variables in the system
Directed edges: Arrows indicating direct causal effects
Acyclic: No variable causes itself (no cycles)
Example: A simple DAG for returns to education:
The arrow from Ability to Education means ability directly affects educational attainment. The arrow from Ability to Earnings means ability directly affects earnings. The arrow from Education to Earnings represents the causal effect we want to estimate.
Paths and Confounding
A path is any sequence of edges connecting two variables (ignoring arrow direction). Paths can be:
Causal paths: Follow arrow directions from cause to effect
Non-causal paths: Include at least one arrow pointing the "wrong" way
Confounding occurs when there is an open non-causal path between treatment and outcome. In the example above:
Causal path: Education → Earnings
Non-causal (confounding) path: Education ← Ability → Earnings
The backdoor path through Ability creates confounding: ability affects both education and earnings, creating correlation even if education has no causal effect.
Blocking Paths: Conditioning and Colliders
To eliminate confounding, we must "block" all non-causal paths. A path is blocked if it contains:
A conditioned variable: Conditioning on a variable blocks paths through it
A collider that is not conditioned on: A collider is a variable where two arrows point in (→ • ←)
Key Rule: Colliders block paths by default. Conditioning on a collider (or its descendants) opens the path.
Example of a collider:
Job Quality is a collider. The path Education → Job Quality ← Connections is blocked unless we condition on Job Quality. Conditioning on a collider introduces spurious association—this is called collider bias or selection bias.

Box: M-Bias—When Controlling for "Confounders" Makes Things Worse
A common heuristic is "control for everything." This can backfire spectacularly.
Consider the "M-shaped" DAG:
Here Z is affected by two unobserved factors: U1 (which also affects treatment D) and U2 (which also affects outcome Y). There is no direct edge between D and Y.
Without controlling for Z: No backdoor path from D to Y is open. The effect of D on Y is correctly identified as zero.
Controlling for Z: Opens the backdoor path D←U1→Z←U2→Y. We now find a spurious association between D and Y.
The lesson: Z "looks like" a confounder (it's correlated with both D and Y), but conditioning on it introduces bias rather than removing it. DAGs reveal this; intuition without DAGs often misses it.
Example: Suppose D = parent's education, Y = child's earnings, and Z = child's education. Parent's ability (U1) affects both parent's education and child's education. Child's ability (U2) affects both child's education and earnings. Controlling for child's education to estimate the "direct effect" of parent's education on child's earnings opens a biasing path through the two abilities—even though abilities are unobserved.
The Backdoor Criterion
Pearl's backdoor criterion formalizes when conditioning identifies causal effects:
Theorem 9.1: Backdoor Criterion
A set of variables X satisfies the backdoor criterion relative to (D,Y) if:
No variable in X is a descendant of D
X blocks every path between D and Y that contains an arrow into D
If X satisfies the backdoor criterion, then: P(Y∣do(D))=∑XP(Y∣D,X)P(X)
Intuition: The backdoor criterion identifies which confounders to adjust for. Adjust for variables that block backdoor paths without opening new paths through colliders.
The Frontdoor Criterion
Sometimes no set of variables satisfies the backdoor criterion (because confounders are unobserved). The frontdoor criterion offers an alternative:
Theorem 9.2: Frontdoor Criterion
If a variable M satisfies:
M intercepts all directed paths from D to Y
There is no unblocked backdoor path from D to M
All backdoor paths from M to Y are blocked by D
Then the causal effect is identified via the mediator M.
The frontdoor criterion is elegant but rarely applicable in practice—it requires observing a complete mediator with specific properties.

Example: DAG for Smoking and Lung Cancer
Consider estimating whether smoking causes lung cancer:
Backdoor path: Smoking ← Genotype → Lung Cancer (if genotype affects both)
Blocking: Condition on Genotype to satisfy backdoor criterion
Problem: Genotype affecting both smoking propensity and cancer risk may be unobserved
If we observe Tar in Lungs (a mediator), the frontdoor criterion might help—but only under strong assumptions about the mediator.
9.4 Reconciling the Frameworks
When Do They Agree?
Potential outcomes and DAGs are not competing theories—they are different languages for the same underlying ideas. Under standard assumptions, they yield identical conclusions about identification.
Causal effect
E[Y(1)−Y(0)]
E[Y∥do(D=1)]−E[Y∥do(D=0)]
Confounding
(Y(0),Y(1))⊥D
Open backdoor path
Adjustment
Unconfoundedness given X
X satisfies backdoor criterion
Randomization
D⊥(Y(0),Y(1))
No backdoor paths (all blocked)
The do(⋅) operator in DAG notation corresponds to intervening on a variable—setting it to a value rather than observing it. E[Y∣do(D=1)] equals E[Y(1)] under SUTVA.
Comparative Advantages
Each framework has strengths:
Potential outcomes:
Clear definition of estimands (ATE, ATT, LATE)
Natural framework for heterogeneous effects
Connects directly to experimental design
Foundation for formal econometric analysis
DAGs:
Visual representation of assumptions
Clear criteria for what to adjust for (and what not to)
Highlights collider bias (often missed in PO reasoning)
Generalizes beyond treatment effects to mediation, selection, etc.
Structural Equations as Bridge
Structural causal models (SCMs) bridge the frameworks. An SCM specifies:
A set of equations: Y=fY(X,εY), D=fD(Z,εD), etc.
A distribution over error terms
A DAG implied by the equations
The structural equations generate both the DAG (from causal relationships) and potential outcomes (by intervening on equations). This formalization shows the frameworks are mathematically equivalent under standard conditions.
When Might They Diverge?
The frameworks can suggest different analyses when:
LATE vs. ATE: Potential outcomes naturally distinguish LATE from ATE when instruments create variation for only a subgroup. DAGs typically focus on average effects.
Effect heterogeneity: Potential outcomes accommodate individual-level variation in τi=Yi(1)−Yi(0). Basic DAGs assume homogeneous effects (though extensions exist).
Interference: Potential outcomes can be extended to allow Yi(Di,D−i)—outcomes depending on others' treatment. DAGs require more elaborate notation.
Practical Synthesis
For applied work, use both:
DAGs to clarify assumptions: Draw the causal structure. Identify confounders and potential colliders. Determine what to adjust for.
Potential outcomes to define estimands: What parameter do you want? ATE, ATT, or LATE? For whom?
Match design to estimand: Does your identification strategy deliver the estimand you want?
9.5 Identification Strategies: An Overview
The Identification Problem
Identification asks: Can we learn the causal effect from observable data, given our assumptions?
A parameter is identified if there is a unique mapping from the population distribution to the parameter value. Identification is about what we can learn in principle with infinite data—separate from finite-sample estimation issues.
The Taxonomy of Strategies
Applied research uses several main identification strategies, each with different assumptions:
1. Randomization (Chapter 10)
Idea: Randomly assign treatment to ensure independence
Key assumption: Successful randomization, no interference, no attrition
Identifies: ATE (or ITT with imperfect compliance)
2. Selection on Observables (Chapter 11)
Idea: Control for all confounders
Key assumption: Unconfoundedness—no unobserved confounders
Identifies: ATE or ATT depending on weighting
Methods: Regression, matching, propensity scores
3. Instrumental Variables (Chapter 12)
Idea: Use exogenous variation that affects treatment but not outcome directly
Key assumption: Exclusion restriction, relevance
Identifies: LATE for compliers
4. Difference-in-Differences (Chapter 13)
Idea: Compare changes over time across treated and control groups
Key assumption: Parallel trends—groups would have evolved similarly absent treatment
Identifies: ATT (for treated groups)
5. Regression Discontinuity (Chapter 14)
Idea: Exploit threshold rules that create local randomization
Key assumption: No manipulation at threshold, continuity
Identifies: Local ATE at the threshold
6. Synthetic Control (Chapter 15)
Idea: Construct counterfactual from weighted donors
Key assumption: Synthetic control approximates counterfactual trajectory
Identifies: Effect for the treated unit(s)
Choosing a Strategy
The choice depends on:
Data available
What variation exists? What is observed?
Credibility of assumptions
Which assumptions are defensible in context?
Estimand of interest
Do you need ATE, ATT, or LATE?
External validity
How local is the estimate?
No strategy is universally best. The credibility of any analysis depends on the credibility of its assumptions in the specific context.
Credibility and Transparency
The credibility revolution (Chapter 1) emphasized:
State assumptions explicitly: Don't hide behind technique
Defend assumptions substantively: Economic/institutional arguments, not just statistical ones
Test what you can: Pre-trends, balance, placebo tests
Show the variation: Graphical evidence of the identifying variation
Acknowledge limitations: What could go wrong?
Design Appropriateness, Not Method Ranking
There is no universal hierarchy of methods (RCT > IV > DiD > OLS). A well-executed observational study with credible assumptions beats a poorly designed experiment. A thoughtful OLS with clear institutional knowledge can be more convincing than a questionable instrument.
Features of credible evidence:
Clear source of identifying variation (why is treatment as-good-as-random conditional on design?)
Assumptions have testable implications (pre-trends, balance, placebo tests)
Multiple approaches yield similar answers (triangulation)
Mechanism is understood (not just "it works")
Warning signs:
Variation is endogenous and hand-waved away
Key assumptions are untestable and implausible
Results are highly sensitive to specification choices
Method chosen for "credibility" rather than fit to problem
The key question: Given the specific research question and available data, which design's assumptions are most defensible? A well-designed difference-in-differences study may be more credible than a weak-instrument IV analysis—even though IV appears "higher" in textbook rankings.
9.6 Qualitative Bridge: Counterfactual Reasoning in Case Studies
Causation Without Statistics
Quantitative methods do not have a monopoly on causal reasoning. Qualitative researchers also make causal claims—and when done well, they engage in rigorous counterfactual reasoning.
The Comparative Method
Mill's methods of agreement and difference formalize comparative case analysis:
Method of Agreement: If cases share outcome Y and factor X, but differ on everything else, X may cause Y.
Method of Difference: If similar cases differ only in X and Y, X likely causes Y.
These are causal logics. They identify causes by exploiting variation across cases—conceptually similar to quantitative identification.
Process Tracing
Process tracing examines the causal chain within a case:
Hypothesize a mechanism: X → M₁ → M₂ → Y
Look for observable implications of each link
Confirm or disconfirm based on evidence
Process tracing provides evidence about mechanisms that aggregate data cannot. It answers "how" and "why" questions.
Counterfactual Reasoning in Case Studies
Rigorous qualitative work explicitly engages counterfactuals:
"Would the outcome have occurred absent the treatment?"
"What would have happened under a different policy?"
Answering requires:
Deep knowledge of the case
Theoretical reasoning about mechanisms
Consideration of alternative explanations
Example: Did Leadership Matter?
Consider whether Churchill's leadership affected Britain's WWII outcome.
A quantitative approach faces a fundamental problem: n = 1. There is no comparison group of WWII Britains with different leaders.
A qualitative approach:
Establish temporal sequence: Churchill became PM; certain outcomes followed
Examine mechanisms: How specifically did Churchill's decisions affect outcomes?
Consider counterfactuals: What would Halifax or Chamberlain have done? What evidence exists about their likely choices?
Assess alternative explanations: Were outcomes determined by factors independent of leadership?
This is counterfactual reasoning—the same logical structure as potential outcomes—applied with case knowledge rather than statistical comparison.
Similar challenges arise in economics. What explains China's post-1978 economic growth? There is only one China—we cannot randomly assign reform packages and observe outcomes. As we saw in Chapter 1, this question requires combining descriptive analysis, regional quasi-experiments, time series methods, and theoretical reasoning. The n = 1 problem at the country level does not make the question unanswerable, but it does require methodological pluralism rather than reliance on a single identification strategy.
When to Combine Methods
Quantitative and qualitative approaches complement each other:
Estimates average effects
Illuminates mechanisms
Handles large samples
Provides depth on cases
Formal uncertainty quantification
Discovers unexpected patterns
External validity through aggregation
Internal validity through process
The best causal evidence often combines both: quantitative estimates of effects with qualitative understanding of how and why they occur.
9.7 Causal Inference Across Disciplines
Parallel Developments
The frameworks presented in this chapter—potential outcomes and DAGs—emerged from distinct intellectual traditions that developed largely in parallel before converging in recent decades. Understanding this history illuminates both the unity of causal reasoning and the different emphases each tradition brings.
Four major traditions:
Potential outcomes
Neyman (1923), Rubin (1974), Holland (1986), Heckman
Counterfactual framework, randomization inference, selection models
Statistics, Economics
Graphical models
Pearl (1988, 2000), Spirtes, Glymour & Scheines (1993)
DAGs, d-separation, do-calculus, structural causal models
Computer Science, Philosophy
G-methods
Robins (1986, 1997), Greenland, Hernán
G-computation, marginal structural models, target trial emulation
Epidemiology, Biostatistics
Structural equation modeling
Wright (1921), Blalock, Duncan, Baron & Kenny (1986)
Path analysis, latent variables, mediation analysis
Psychology, Sociology
These traditions asked similar questions—When can we infer causation from data?—but approached them with different tools and emphases.
What Each Tradition Emphasizes
Each tradition brings distinctive emphases worth appreciating:
Economics emphasizes:
Clear identification strategies tied to institutional knowledge
Heterogeneous treatment effects (LATE vs. ATE)
Economic theory as guide to selection and mechanisms
External validity and policy relevance
Epidemiology emphasizes:
Well-defined interventions ("no causation without manipulation")
Time-varying treatments and dynamic regimes
Target trial emulation as design framework
Positivity (overlap) and practical violations
Computer Science emphasizes:
Graphical representation of causal assumptions
Completeness results (do-calculus)
Causal discovery from observational data
Transportability across settings
Psychology/SEM emphasizes:
Latent constructs and measurement
Mediation and indirect effects
Model fit and specification testing
Standardized effects for comparison
The methodologically pluralist researcher draws on all traditions. Use DAGs to clarify assumptions (CS). Ground identification in institutional knowledge (economics). Think carefully about the target trial (epidemiology). Model measurement error appropriately (psychology). No single tradition has a monopoly on insight.
The Epidemiological Contribution
Epidemiology developed causal inference methods to address questions where randomized trials are often impossible: Does smoking cause cancer? Do environmental exposures cause disease? Does a treatment work in real-world populations?
James Robins and collaborators developed a family of methods—collectively called "g-methods"—that handle complications common in medical research:
G-computation (also called the g-formula or standardization) estimates causal effects by:
Modeling the outcome as a function of treatment and confounders
Predicting outcomes under each treatment level for each unit
Averaging these predictions over the population
This is essentially the outcome-modeling approach to causal inference (Section 11.1), but the epidemiological formulation emphasizes the importance of modeling the full data-generating process, especially for time-varying treatments.
Marginal structural models (MSMs) handle time-varying treatments with time-varying confounding—situations where a confounder at time t is affected by treatment at time t−1. Standard regression fails here because conditioning on the confounder blocks part of the causal effect. MSMs use inverse probability weighting to create a pseudo-population where confounding is eliminated without conditioning.
Target trial emulation is a conceptual framework that asks: What randomized trial would we ideally run to answer this question? Observational analysis then attempts to emulate that trial, making explicit where emulation is imperfect. This disciplined approach prevents common observational study pitfalls like immortal time bias.
For Further Study: The Epidemiological Tradition
Hernán & Robins (2020), Causal Inference: What If, is freely available online and provides a comprehensive treatment of g-methods from an epidemiological perspective. It emphasizes time-varying treatments, which are common in medical contexts but less central to most economic applications.
Hernán, Hsu & Healy (2019), "A Second Chance to Get Causal Inference Right," Chance, offers a short accessible introduction to the epidemiological perspective on causal inference.
The Psychology Tradition: Structural Equation Modeling
While the first three traditions have increasingly converged around shared concepts, the structural equation modeling (SEM) tradition in psychology and sociology developed somewhat independently.
The roots trace to Sewall Wright's path analysis (1920s-1930s), which used diagrams with arrows to represent causal relationships among variables. Sociologists Hubert Blalock and Otis Dudley Duncan developed these ideas in the 1960s-70s. By the 1980s, SEM had become the dominant quantitative method in psychology, incorporating:
Latent variables: Unobserved constructs measured by multiple indicators
Measurement models: Relating latent variables to observed measures
Structural models: Causal relationships among latent and observed variables
Path coefficients: Standardized regression weights with causal interpretation
The Baron-Kenny approach to mediation analysis (1986), hugely influential in psychology, emerged from this tradition.
Tensions with modern causal inference: Pearl has been critical of how SEM is often used in practice. The concern is that SEM software estimates associations, but researchers interpret output as if it established causation. Key issues:
Identification is assumed, not established: SEM estimates parameters of a specified model, but the model's causal structure must be justified externally
Fit statistics are not validation: A model can fit well yet be causally wrong
Mediation requires interventionist assumptions: The "indirect effect" of X through M is only meaningful if intervening on M is well-defined
When SEM is valuable: Despite these concerns, SEM offers genuine contributions:
Measurement error: Latent variable models handle measurement error more rigorously than observed-variable methods
Construct validity: Multiple indicators for a construct provide validity evidence
Complex causal structures: SEM naturally represents systems with many variables and paths
Simultaneous estimation: All paths are estimated jointly, propagating uncertainty appropriately
The key is distinguishing what SEM can and cannot do. It can estimate parameters of a hypothesized causal model and test whether the model is consistent with observed covariances. It cannot verify that the causal model is correct. As with all methods in this book, causal interpretation requires substantive justification—SEM is a tool for estimation, not a machine for discovering causation.
For Further Study: SEM and Causal Inference
Bollen & Pearl (2013), "Eight Myths About Causality and Structural Equation Models," clarifies what SEM can and cannot establish causally.
Kline (2023), Principles and Practice of Structural Equation Modeling, 5th ed., is the standard textbook, with recent editions incorporating modern causal inference concepts.
Convergence and Translation
These traditions have increasingly converged. Pearl's work explicitly bridges graphical models and potential outcomes. Epidemiologists now routinely use DAGs. Economists have adopted propensity scores and doubly robust methods from biostatistics. Psychologists are incorporating counterfactual definitions of mediation.
Yet terminological differences persist. The following table maps equivalent concepts across traditions:
Selection on observables
Exchangeability given L
Ignorability
Backdoor criterion
Propensity score weighting
IPTW (inverse probability of treatment weighting)
IPW
—
Regression adjustment
G-computation / Standardization
Outcome modeling
Adjustment formula
Instrumental variables
Mendelian randomization
IV
Instrumental inequality
ATE, ATT
ATE, ATT
PATE, SATE
ACE
Parallel trends (DiD)
—
—
—
External validity
Transportability
Generalizability
Transportability
Bad controls
Adjusting for a collider
Collider stratification
Collider bias
When reading across disciplines, this translation table helps. A paper in epidemiology discussing "exchangeability conditional on L" is invoking the same assumption economists call "selection on observables" or statisticians call "ignorability."
Practical Guidance
When to Use Each Framework
Defining treatment effects
✓
Clarifying what to control for
✓
Heterogeneous effects
✓
Complex confounder structures
✓
Avoiding collider bias
✓
Connecting to experiments
✓
Common Pitfalls
Pitfall 1: Confusing Correlation with Causation
The oldest pitfall. Observational associations—even strong, robust ones—do not establish causation without identifying assumptions.
How to avoid: Always ask: What is the source of identifying variation? What assumptions justify a causal interpretation?
Pitfall 2: Controlling for Everything
Including all available covariates can introduce bias (collider bias, post-treatment control) rather than reduce it.
How to avoid: Draw a DAG. Identify confounders. Control for them—and only them.
Pitfall 3: Confusing Estimands
ATE, ATT, and LATE are different parameters. Your method may identify one but not others.
How to avoid: Define your estimand explicitly. Match your method to your estimand. Interpret accordingly.
Pitfall 4: Assuming SUTVA Without Justification
If units interact (students in classrooms, firms in markets), potential outcomes depend on others' treatment. Standard methods fail.
How to avoid: Consider interference. Design studies to detect spillovers. Use appropriate methods when interference is likely.
Drawing DAGs: A Checklist
List all relevant variables: Treatment, outcome, potential confounders, mediators
Draw arrows for direct causal relationships: X → Y means X directly causes Y
Identify backdoor paths: Non-causal paths from treatment to outcome
Identify colliders: Variables with two arrows pointing in
Determine adjustment set: Block backdoor paths without opening colliders
Check for post-treatment variables: Don't control for mediators or consequences of treatment
Integration Note
Connections to Other Chapters
Ch. 1 (Empirical Enterprise)
Philosophical foundations, limits of knowledge
Ch. 10 (Experiments)
Randomization as gold standard for eliminating confounding
Ch. 11 (Selection on Observables)
Implementing backdoor criterion with regression/matching
Ch. 12 (IV)
LATE and compliers
Ch. 13-15 (DiD, RD, SCM)
Panel-based identification strategies
Ch. 19 (Mechanisms)
Frontdoor criterion, mediation analysis
Ch. 20 (Heterogeneity)
Treatment effect heterogeneity and its implications
How This Chapter Sets Up Part III
This chapter provides the conceptual foundation for all subsequent causal inference chapters:
Chapter 10 shows how randomization eliminates confounding
Chapters 11-15 each address confounding through different identifying assumptions
Chapter 16 extends causal logic to time series
Chapter 17 asks what we can learn when identification fails
Each method can be understood as a different solution to the fundamental problem: finding credible ways to learn about counterfactuals from observable data.
Summary
Key takeaways:
Causation is about counterfactuals: X causes Y if Y would have been different had X been different. The fundamental problem is that counterfactuals are unobservable.
Potential outcomes formalize causal effects: τi=Yi(1)−Yi(0). We estimate averages (ATE, ATT, LATE) since individual effects are unobservable.
DAGs represent causal structure graphically: They clarify what to adjust for (confounders) and what not to (colliders, mediators).
The frameworks are complementary: Use potential outcomes to define estimands, DAGs to clarify assumptions.
Identification requires assumptions: Every causal claim rests on assumptions about the data-generating process. The credibility of the claim depends on the credibility of the assumptions.
Multiple strategies exist: Randomization, selection on observables, IV, DiD, RD, synthetic control—each with different assumptions and different estimands.
Returning to the opening question: To say X causes Y means that Y would be different if X were different. We can sometimes learn this from data—but only by combining observations with assumptions about how the world works. The art of causal inference is finding credible assumptions and transparent methods to learn from the data we have.
Further Reading
Essential
Holland (1986). "Statistics and Causal Inference." JASA. Classic statement of the fundamental problem.
Angrist & Pischke (2009). Mostly Harmless Econometrics, Chapters 1-2. The potential outcomes framework for economists.
For Deeper Understanding
Imbens & Rubin (2015). Causal Inference for Statistics, Social, and Biomedical Sciences. Comprehensive treatment of potential outcomes.
Pearl (2009). Causality, 2nd ed. The definitive treatment of DAGs and structural causal models.
Morgan & Winship (2015). Counterfactuals and Causal Inference, 3rd ed. Excellent integration of both frameworks.
On the Debate Between Frameworks
Pearl (2009). "Causal Inference in Statistics: An Overview."
Imbens (2020). "Potential Outcome and Directed Acyclic Graph Approaches to Causality." Econometrician's perspective.
Qualitative Methods
Mahoney (2012). "The Logic of Process Tracing Tests in the Social Sciences." Process tracing formalized.
Goertz & Mahoney (2012). A Tale of Two Cultures: Qualitative and Quantitative Research in the Social Sciences.
Applications
Angrist & Krueger (1999). "Empirical Strategies in Labor Economics." Tour of identification strategies.
Cunningham (2021). Causal Inference: The Mixtape. Accessible introduction with examples.
Exercises
Conceptual
Explain in your own words why random assignment solves the fundamental problem of causal inference. What assumptions must still hold for a randomized experiment to identify the ATE?
Draw a DAG for the following scenario: We want to estimate the effect of class size on student achievement. Potential confounders include school budget and parental involvement. Student ability is affected by parental involvement and directly affects achievement.
Identify the backdoor paths
What should we control for?
Is there any collider we should avoid controlling for?
A researcher estimates the effect of job training on wages using OLS, controlling for age, education, prior wages, and post-training job quality. Critique this approach using both potential outcomes and DAG reasoning. What mistake might the researcher be making?
Applied
Consider the question: Does attending an elite university increase earnings?
Define the potential outcomes Y(1) and Y(0)
What is the selection problem in a naive comparison?
Propose two different identification strategies and state their key assumptions
What estimand (ATE, ATT, LATE) would each strategy identify?
Find a recent empirical paper that makes a causal claim. Draw the implicit DAG. Identify:
The assumed confounders
Any potential unobserved confounders
Any variables that might be colliders
Whether the identification strategy is convincing
Discussion
Some critics argue that DAGs are less useful in economics because economic variables are jointly determined (equilibrium) rather than having clear causal direction. Evaluate this critique. When might DAGs be more or less useful in economic applications?
Technical Appendix: Formal Results
A.1 Identification Under Unconfoundedness
Under Assumptions 9.1 (unconfoundedness) and 9.2 (overlap):
ATE=E[E[Y∣D=1,X]−E[Y∣D=0,X]]
Proof: \begin{align} E[Y(1)] &= E[E[Y(1) | X]] \ &= E[E[Y(1) | D = 1, X]] \quad \text{(by unconfoundedness)} \ &= E[E[Y | D = 1, X]] \quad \text{(observed = potential for treated)} \end{align}
Similarly for E[Y(0)]. The result follows.
A.2 The d-Separation Criterion
A path p is d-separated by a set Z if:
p contains a chain A→B→C or fork A←B→C where B∈Z, OR
p contains a collider A→B←C where B∈/Z and no descendant of B is in Z
Two variables X and Y are d-separated by Z if all paths between them are d-separated by Z.
Implication: If X and Y are d-separated by Z, then X⊥Y∣Z in any distribution compatible with the DAG.
A.3 Equivalence of Frameworks
Under a nonparametric structural equation model with independent errors:
P(Y∣do(X=x))=∑zP(Y∣X=x,Z=z)P(Z=z)
where Z satisfies the backdoor criterion.
This equals E[Y(x)] in potential outcomes notation when the structural model generates the potential outcomes.
Last updated