Chapter 12: Instrumental Variables

Opening Question

When you cannot randomly assign a treatment and observable confounders do not explain all the selection, can you still learn something about causal effects?


Chapter Overview

The methods we have encountered so far require either controlling the treatment assignment (experiments) or observing all relevant confounders (selection on observables). But what if neither condition holds? What if people select into treatment based on factors we cannot see or measure?

Instrumental variables (IV) offer an answer: find an external source of variation that affects treatment but has no direct effect on outcomes. This is the logic behind draft lotteries, compulsory schooling laws, and weather shocks. When such variation exists, we can use it to identify causal effects even in the presence of unobserved confounding.

This chapter develops the theory and practice of IV estimation. We begin with the core logic, move through estimation and inference, confront the complications of weak instruments and heterogeneous effects, and end with practical guidance on when IV is credible. The returns to education serves as our primary running example---a question that has driven methodological innovation for three decades.

What you will learn:

  • The logic of instrumental variables as a source of exogenous variation

  • How 2SLS estimation works and when it recovers causal effects

  • What LATE means and why it matters for interpretation

  • How to detect and handle weak instruments

  • When IV estimates are (and are not) credible

Prerequisites: Chapter 9 (The Causal Framework), Chapter 3 (Statistical Foundations)


12.1 The Logic of Instruments

Why We Need Instruments

Consider the classic question: what is the causal effect of an additional year of education on earnings?

If we simply regress log wages on years of schooling, we get a correlation. But this correlation likely overstates the causal effect. People who obtain more education may differ systematically from those who do not---in motivation, ability, family background, and countless other ways that also affect earnings. These unobserved factors confound the relationship between education and wages.

Chapter 11 addressed this problem by assuming we could observe and control for all relevant confounders. But what if we cannot measure ability? What if family background is only partially captured by parental education and income?

Instrumental variables offer an alternative path forward.

The Core Idea

The logic of IV is simple in principle:

  1. Find a variable Z that affects the treatment D

  2. Verify that Z affects the outcome Y only through its effect on D

  3. Use the variation in D induced by Z to estimate the causal effect

The variable Z is called an instrument. It must satisfy two conditions:

Condition 1: Relevance The instrument must affect the treatment: Cov(Z,D)0\text{Cov}(Z, D) \neq 0

Condition 2: Exclusion The instrument must affect the outcome only through treatment: Cov(Z,ε)=0\text{Cov}(Z, \varepsilon) = 0 where ε\varepsilon represents all factors affecting Y other than D

The relevance condition is testable. The exclusion restriction is not---it is an assumption about the world that requires substantive justification.

Example: Vietnam-Era Draft Lottery

Perhaps the most famous IV for education is the Vietnam-era draft lottery. In the early 1970s, the U.S. military drafted men based on randomly assigned lottery numbers. Men with low lottery numbers faced high probability of military service; men with high numbers faced almost none.

How does this help with returns to education? Men who received low lottery numbers often sought draft deferments---and one reliable deferment was college enrollment. So the draft lottery affected education. But the lottery number itself was randomly assigned, meaning it should not be correlated with ability, family background, or other confounders.

The logic of the IV strategy:

  • Z = Draft lottery number (or indicator for "high draft risk")

  • D = Years of education

  • Y = Log earnings

The draft lottery provides exogenous variation in education. By comparing outcomes for men with different lottery numbers, we can estimate how education causally affects earnings---without needing to observe ability or other confounders.

Running Example: Returns to Education

The effect of education on earnings is our primary running example for this chapter. We will see the draft lottery IV (Angrist 1990), the compulsory schooling IV (Angrist & Krueger 1991), and the geographic proximity IV (Card 1995). Each illustrates different aspects of IV methodology. By chapter's end, you will understand both the power and the limitations of IV for answering this fundamental question.

Figure 12.1: The Instrumental Variables DAG. The instrument Z affects treatment D but has no direct effect on outcome Y. Unobserved confounders U create spurious correlation between D and Y. IV uses variation in D induced by Z to identify the causal effect.

12.2 Formal Framework

The Structural Equations

Consider the standard setup with a single endogenous regressor:

Yi=β0+β1Di+εiY_i = \beta_0 + \beta_1 D_i + \varepsilon_i

where:

  • YiY_i is the outcome

  • DiD_i is the treatment (endogenous: Cov(D,ε)0\text{Cov}(D, \varepsilon) \neq 0)

  • β1\beta_1 is the causal effect we want to estimate

  • εi\varepsilon_i captures unobserved factors affecting Y

The problem is that DD is correlated with ε\varepsilon. People with high ε\varepsilon (high ability, say) tend to have high DD (more education). OLS conflates the causal effect β1\beta_1 with the correlation between DD and ε\varepsilon.

Now introduce an instrument ZZ satisfying:

Assumption 12.1 (Relevance): Cov(Z,D)0\text{Cov}(Z, D) \neq 0

Assumption 12.2 (Exclusion): Cov(Z,ε)=0\text{Cov}(Z, \varepsilon) = 0

The Wald Estimator

Under these assumptions, we can derive the IV estimator. The key insight is that:

Cov(Z,Y)=Cov(Z,β0+β1D+ε)=β1Cov(Z,D)+Cov(Z,ε)\text{Cov}(Z, Y) = \text{Cov}(Z, \beta_0 + \beta_1 D + \varepsilon) = \beta_1 \cdot \text{Cov}(Z, D) + \text{Cov}(Z, \varepsilon)

If the exclusion restriction holds (Cov(Z,ε)=0\text{Cov}(Z, \varepsilon) = 0), then:

Cov(Z,Y)=β1Cov(Z,D)\text{Cov}(Z, Y) = \beta_1 \cdot \text{Cov}(Z, D)

Solving for β1\beta_1:

β1=Cov(Z,Y)Cov(Z,D)\beta_1 = \frac{\text{Cov}(Z, Y)}{\text{Cov}(Z, D)}

This is the Wald estimator when Z is binary. It equals the ratio of:

  • The reduced form: how Z affects Y

  • The first stage: how Z affects D

Interpretation: The effect of Z on Y comes entirely through D. Divide by how much Z moves D to get how much D moves Y.

Two-Stage Least Squares (2SLS)

The Wald estimator extends naturally to multiple instruments and continuous instruments via two-stage least squares:

First stage: Regress D on Z (and any exogenous controls X): Di=π0+π1Zi+π2Xi+νiD_i = \pi_0 + \pi_1 Z_i + \pi_2' X_i + \nu_i

Second stage: Regress Y on the fitted values D^\hat{D} (and controls X): Yi=β0+β1D^i+β2Xi+uiY_i = \beta_0 + \beta_1 \hat{D}_i + \beta_2' X_i + u_i

The coefficient β^1\hat{\beta}_1 from the second stage is the 2SLS estimator.

Intuition: The first stage isolates the variation in D that comes from Z. The second stage uses only this "clean" variation to estimate the effect on Y. Variation in D coming from ε\varepsilon is stripped away.

Figure 12.4: The 2SLS Intuition. Left panel shows the first stage: the instrument Z predicts treatment D. Middle panel shows the reduced form: Z's effect on Y. Right panel shows the IV estimate: the ratio of reduced form to first stage gives the causal effect of D on Y. The key insight is that IV uses only the variation in D induced by Z, isolating exogenous variation from confounded variation.

What Could Go Wrong?

The two assumptions---relevance and exclusion---may fail:

Weak instruments: If ZZ only weakly predicts DD, the first stage is weak. This creates several problems:

  • Large sampling variance

  • Bias toward OLS

  • Unreliable standard errors

We address weak instruments in Section 12.4.

Exclusion violation: If ZZ directly affects YY---or affects YY through some channel other than DD---the IV estimate is biased. Unlike weak instruments, exclusion violations cannot be detected from the data. They require substantive argument.


12.3 Estimation and Inference

Implementing 2SLS

Standard software makes 2SLS easy. In Stata:

In R:

Important: Always use the built-in IV commands. Do not manually run two regressions---this produces incorrect standard errors.

First Stage Diagnostics

Before trusting IV estimates, examine the first stage:

  1. F-statistic on excluded instruments: A rule of thumb is F > 10 (Stock, Wright & Yogo 2002). Modern practice often demands F > 100 for robust inference.

  2. First stage coefficient: Is the sign correct? Is the magnitude plausible?

  3. Visual inspection: Plot DD against ZZ. Is there a clear relationship?

Practical Box: First Stage Checklist

Standard Errors

With a valid instrument and large samples, the 2SLS standard errors are consistent. However:

  • Cluster your standard errors if the instrument varies at a group level (e.g., state policy, lottery cohort)

  • Use robust standard errors by default to handle heteroskedasticity

  • With weak instruments, standard errors may be misleading---see Section 12.4

The Overidentification Test (J-Test)

With more instruments than endogenous variables (overidentification), we can partially test instrument validity. The Sargan-Hansen J-test checks whether all instruments give the same answer.

The idea: If all instruments are valid, they should all point to the same β\beta. The J-test checks whether the instruments disagree more than sampling variation would explain.

Implementation:

Interpretation: Under the null (all instruments valid), the J-statistic is χ2\chi^2 with degrees of freedom equal to the number of overidentifying restrictions (number of instruments minus number of endogenous variables). Rejection suggests at least one instrument is invalid.

Box: Why the J-Test Has Limited Value

The overidentification test sounds appealing—a way to test the untestable exclusion restriction. But it has severe limitations:

1. It tests consistency, not validity If all instruments are invalid in the same direction, they will "agree" and the J-test will pass. Example: If both father's and mother's education affect child's earnings directly (not just through child's education), but both biases are upward, the J-test won't detect the problem.

2. Rejection is ambiguous If the J-test rejects, you know something is wrong—but not which instrument. With three instruments, one, two, or all three could be invalid.

3. Non-rejection proves nothing Passing the J-test does not mean your instruments are valid. It only means they're consistent with each other.

4. Power is often low The test may fail to reject even when instruments are moderately invalid, especially with weak instruments.

Practical guidance: Report the J-test if you have multiple instruments, but don't treat non-rejection as validation. The J-test is necessary but far from sufficient for credibility. The real work is defending exclusion substantively.


12.4 Weak Instruments

The Problem

An instrument is "weak" if it explains little of the variation in D. Formally, the first-stage F-statistic is small.

Weak instruments cause three problems:

  1. Bias: The IV estimator is biased toward OLS in finite samples

  2. Variance: Standard errors become large and unstable

  3. Inference: t-tests and confidence intervals may be misleading

The severity depends on how weak the instrument is. Stock, Wright & Yogo (2002) showed that F = 10 is approximately the threshold below which conventional inference becomes unreliable.

Figure 12.2: The Weak Instruments Problem. With strong instruments (left), IV estimates are approximately unbiased with moderate variance. With weak instruments (right), IV estimates are biased toward OLS, have large variance, and may produce misleading confidence intervals.

Detection

Test for weak instruments using the first-stage F-statistic:

F-statistic
Interpretation

F > 100

Strong instrument

20 < F < 100

Probably adequate

10 < F < 20

Borderline; consider robust inference

F < 10

Weak; do not trust standard 2SLS

Robust Inference

When instruments may be weak, use weak-instrument-robust methods:

  1. Anderson-Rubin test: Tests the null β1=β0\beta_1 = \beta_0 for any hypothesized value. Invert to get confidence intervals. Valid regardless of instrument strength.

  2. Conditional likelihood ratio (CLR): More efficient than AR with multiple instruments.

  3. tF adjustment: Lee et al. (2022) propose adjusting critical values based on first-stage F.

In Stata:

Common Pitfall: The "Just Significant" First Stage

Some researchers accept any statistically significant first stage as adequate. This is wrong. An instrument can be significant but still weak. Focus on the F-statistic, not the p-value.

How to avoid: Report F-statistics. If F < 20, seriously consider weak-instrument-robust methods.

Alternative Estimators: LIML and Fuller

With weak instruments, 2SLS is biased toward OLS. Alternative estimators can reduce this bias:

Limited Information Maximum Likelihood (LIML)

LIML is an alternative to 2SLS that is median-unbiased—its median equals the true parameter even with weak instruments. The bias of 2SLS is proportional to the number of instruments; LIML's bias does not depend on the number of instruments.

β^LIML=(XZ(ZZ)1ZXκ^XMZX)1(XZ(ZZ)1ZYκ^XMZY)\hat{\beta}_{LIML} = (X'Z(Z'Z)^{-1}Z'X - \hat{\kappa} X'M_Z X)^{-1}(X'Z(Z'Z)^{-1}Z'Y - \hat{\kappa} X'M_Z Y)

where κ^\hat{\kappa} is the smallest root of a generalized eigenvalue problem.

Intuition: LIML can be understood as 2SLS applied to a transformed model where the first-stage residual variance is scaled appropriately. This scaling corrects the finite-sample bias.

Fuller's Modified LIML

Fuller (1977) proposed a modification that reduces LIML's variance at the cost of slightly more bias. The Fuller estimator replaces κ^\hat{\kappa} with κ^c/(nK)\hat{\kappa} - c/(n - K) where c=1c = 1 or c=4c = 4 are common choices.

  • Fuller(1): Less biased than 2SLS, lower variance than LIML

  • Fuller(4): Even lower variance, slightly more bias

Estimator
Bias
Variance
When to use

2SLS

Toward OLS with weak instruments

Lowest (if F > 100)

Strong instruments only

LIML

Median-unbiased

Higher than 2SLS

Weak instruments, inference focus

Fuller(1)

Less than 2SLS

Between 2SLS and LIML

Weak instruments, MSE focus

Practical guidance: When the first-stage F-statistic is between 10 and 50, report both 2SLS and LIML. If they differ substantially, the instruments are likely too weak for reliable inference under any method.


12.5 LATE: What Are We Actually Estimating?

Heterogeneous Treatment Effects

So far we assumed a constant effect β1\beta_1. But treatment effects may vary across people. Some people gain a lot from education; others gain little.

When effects are heterogeneous, what does IV estimate?

Compliers, Always-Takers, Never-Takers

Consider a binary instrument ZZ and binary treatment DD. Define four groups:

Type
D when Z=0
D when Z=1
Description

Compliers

0

1

Take treatment only when induced by Z

Always-takers

1

1

Always take treatment

Never-takers

0

0

Never take treatment

Defiers

1

0

Do the opposite of Z

The draft lottery example:

  • Compliers: Men who enrolled in college because of draft risk

  • Always-takers: Men who would have attended college regardless

  • Never-takers: Men who would not have attended college regardless

  • Defiers: Men who dropped out because of draft risk (assumed rare)

Figure 12.3: Complier Types in the LATE Framework. The population divides into four principal strata based on how treatment responds to the instrument. IV estimates the treatment effect only for compliers—those induced to change treatment status by the instrument.

Box: Why Monotonicity Matters—The Problem of Defiers

The monotonicity assumption (Di(1)Di(0)D_i(1) \geq D_i(0) for all ii) rules out defiers. This assumption often receives less attention than exclusion, but violations can be equally devastating.

What goes wrong with defiers: The Wald estimator divides the reduced-form effect by the first stage: β^IV=E[YZ=1]E[YZ=0]E[DZ=1]E[DZ=0]\hat{\beta}_{IV} = \frac{E[Y|Z=1] - E[Y|Z=0]}{E[D|Z=1] - E[D|Z=0]}

With defiers present, the denominator includes offsetting effects:

  • Compliers: DD goes from 0 to 1 when Z=1Z=1 (positive contribution)

  • Defiers: DD goes from 1 to 0 when Z=1Z=1 (negative contribution)

The first stage is now the net effect. If compliers and defiers have different treatment effects, IV estimates a weighted average where defiers receive negative weight. The result can fall outside the range of any individual's treatment effect.

Example: Suppose a job training instrument encourages most workers to enroll (compliers), but makes a few workers suspicious and refuse (defiers). If defiers are high-ability workers who would have benefited most from training, the IV estimate will be biased downward—it subtracts the defiers' large effects from the compliers' smaller effects.

When to worry:

  • Instruments that induce both positive and negative responses in subgroups

  • Policies with both "nudge" and "reactance" effects

  • Price changes where some consumers increase and others decrease consumption

What to do:

  1. Argue substantively that defiers are implausible or rare

  2. Look for subgroups where defiers might exist and test for sign reversals

  3. Consider partial identification approaches that allow for defiers (Huber and Mellace 2015)

The Local Average Treatment Effect

Theorem 12.1 (LATE; Imbens & Angrist 1994)

Under the assumptions of relevance, exclusion, and monotonicity (no defiers), IV identifies:

βIV=E[Y1Y0Compliers]\beta_{IV} = E[Y_1 - Y_0 | \text{Compliers}]

The local average treatment effect---the average effect for those whose treatment status is changed by the instrument.

Intuition: IV uses variation induced by ZZ. This variation only affects compliers. Always-takers and never-takers contribute no variation. So IV estimates the effect for compliers only.

Implications

LATE has profound implications:

  1. External validity: IV estimates apply to compliers, who may not be representative of the population. Draft lottery compliers were probably marginal college-goers. Their returns may differ from average returns.

  2. Different instruments, different estimates: Two valid instruments can give different estimates if they identify different groups of compliers. This is not a contradiction---it is a feature.

  3. Policy relevance: LATE may or may not be policy-relevant, depending on whether the policy affects the same people as the instrument.

Figure 12.5: LATE as a Weighted Average. When treatment effects are heterogeneous, LATE is a weighted average of effects across complier subgroups. The left panel shows different complier types with varying treatment effects. The right panel illustrates how LATE weights these effects by the probability of compliance—compliers more likely to be shifted by the instrument receive more weight in the LATE estimate.

Running Example: Who Are the Compliers?

In the draft lottery study, compliers were men whose college enrollment depended on draft risk. These were likely:

  • Men from families where college was possible but not certain

  • Men for whom military service was particularly unappealing

Their returns to education may exceed the population average if marginal college-goers benefit more from the credential signal.


12.6 Dose-Response with IV

Beyond Binary Treatment

Many treatments are continuous: years of education, dosage of medication, hours of training. Can IV handle continuous treatments?

Yes, with additional assumptions. The standard approach:

  1. Assume a linear relationship between D and Y

  2. Use IV to estimate the slope

This gives the effect of a one-unit increase in treatment, averaged across whatever shifts the instrument induces.

Control Function Approach

An alternative is the control function approach:

  1. First stage: D=π0+π1Z+νD = \pi_0 + \pi_1 Z + \nu

  2. Outcome equation: Y=β0+β1D+ρν^+uY = \beta_0 + \beta_1 D + \rho \hat{\nu} + u

The residual ν^\hat{\nu} captures the endogenous variation in D. Including it as a control removes the bias. The coefficient β1\beta_1 on D gives the causal effect.

Advantages:

  • Clearer about what endogeneity you are correcting

  • Easier to test for endogeneity (ρ=0\rho = 0?)

  • More flexible for nonlinear first stages


12.7 Shift-Share (Bartik) Instruments

Shift-share instruments have become one of the most widely used identification strategies in applied economics, particularly in labor, trade, and migration research. Understanding their structure and the ongoing debates about their validity is essential for modern empirical work.

The Basic Structure

A shift-share instrument combines:

  • Shares: Pre-determined exposure weights (e.g., initial industry employment shares in a region)

  • Shocks: Aggregate changes (e.g., national industry growth rates)

The instrument for region rr at time tt is:

Zrt=ksrk,t0×gktZ_{rt} = \sum_{k} s_{rk,t_0} \times g_{kt}

where:

  • srk,t0s_{rk,t_0} = region rr's share of industry kk at baseline t0t_0

  • gktg_{kt} = national (leave-one-out) growth in industry kk at time tt

The Classic Application: Bartik (1991)

Timothy Bartik studied how local labor demand affects wages and employment. The challenge: local labor demand is endogenous to local wages.

Solution: Construct predicted labor demand growth by interacting:

  • Each region's initial industry composition (shares)

  • National industry growth rates (shocks)

Regions with more manufacturing are predicted to grow faster when national manufacturing booms—not because of anything special about that region, but because of its pre-determined exposure to national trends.

Two Views on Identification

A major debate concerns the source of identifying variation:

1. Exogenous Shares (Goldsmith-Pinkham, Sorkin & Swift 2020)

Identification comes from the shares being uncorrelated with unobserved regional characteristics. The estimator is equivalent to a GMM estimator using each industry share as a separate instrument.

  • Assumption: Baseline shares are as good as randomly assigned

  • Test: Check balance of shares against pre-trends and observables

  • Best suited for: Settings where initial specialization patterns are plausibly exogenous

2. Exogenous Shocks (Borusyak, Hull & Jaravel 2022)

Identification comes from the shocks being exogenous—the national industry trends are independent of region-specific factors.

  • Assumption: Shocks are as good as randomly assigned across industries

  • Test: Check balance of shocks; clustering at shock level

  • Best suited for: Settings with many quasi-random shocks (e.g., trade shocks from policy changes)

Practical Implementation

Key Considerations

Inference: Standard errors must reflect the structure:

  • If relying on shock exogeneity: cluster at the shock level (industry)

  • If relying on share exogeneity: cluster at the unit level (region)

  • Exposure-robust standard errors (Adão, Kolesár & Morales 2019) are often appropriate

Leave-one-out: Always construct national shocks excluding the focal region to avoid mechanical correlation.

Many weak shocks: With many small shocks, aggregation helps. With few dominant shocks, the instrument may be weak.

Examples in the Literature

Paper
Shares
Shocks
Finding

Autor, Dorn & Hanson (2013)

Initial industry shares

Chinese import growth

China trade shock reduced manufacturing employment

Card (2001)

Initial immigrant shares

Immigrant inflows by origin

Immigration affects local wages

Nakamura & Steinsson (2014)

Regional military spending shares

National defense spending

Fiscal multiplier estimation

When to Use Shift-Share

Condition
Assessment

Clear shock source (policy, trade, technology)

Favors shock-based identification

Pre-period shares are quasi-random

Favors share-based identification

Many small industries/shocks

Aggregation provides power

Few dominant shocks

May have weak instrument problems

Shares and shocks both questionable

Consider alternative strategies

Warning: Shift-share instruments are not a "free lunch." The identifying assumptions—whether on shares or shocks—must be defended. The popularity of this approach has led to applications where neither shares nor shocks are plausibly exogenous.


Practical Guidance

When to Use IV

Situation
Use IV?
Notes

Strong, clearly exogenous instrument available

Yes

The ideal case

Instrument's exclusion is debatable

Maybe

Depends on quality of argument

First stage F < 10

Caution

Use robust methods, report wide bounds

Multiple weak instruments available

Maybe

Consider LIML or regularization

No plausible instrument exists

No

Do not invent one; consider bounds

Common Pitfalls

Pitfall 1: Assuming Any Correlation Is Causal

The fact that Z predicts D does not make Z a valid instrument. Many predictors are themselves endogenous.

How to avoid: Articulate why Z is exogenous. What is the source of randomness?

Pitfall 2: Ignoring the Exclusion Restriction

The exclusion restriction cannot be tested. Researchers sometimes treat it as automatically satisfied because Z is "random."

How to avoid: Think hard about channels. Could Z affect Y directly? Through other variables?

Pitfall 3: Overinterpreting LATE

LATE applies to compliers, not the population. Policy effects may differ.

How to avoid: Characterize compliers when possible. Discuss external validity.

Implementation Checklist


Qualitative Bridge

How Qualitative Methods Complement IV

IV provides a point estimate for compliers---but leaves much unknown:

  1. Who are the compliers? IV identifies a causal effect for an unknown subgroup. Qualitative research can characterize this group through interviews or case studies.

  2. Is the exclusion restriction plausible? The exclusion restriction is an assumption about mechanisms. Qualitative investigation of how the instrument operates can strengthen or weaken the case.

  3. What mechanisms drive the effect? IV tells us that D causes Y, not how. Process tracing and qualitative case studies can illuminate mechanisms.

Example: Understanding Draft Lottery Compliers

Card and Lemieux (2001) complemented the draft lottery IV with detailed investigation of who was affected by draft risk. They examined:

  • Enrollment patterns by socioeconomic status

  • Timing of enrollment decisions

  • Subsequent labor market behavior

This qualitative work helped interpret what the IV estimate meant and for whom it was relevant.


Integration Note

Connections to Other Methods

Method
Relationship
See Chapter

Selection on observables

IV handles unobserved confounding; SOO requires observed confounders

Ch. 11

RDD (fuzzy)

Fuzzy RDD is IV at a threshold

Ch. 14

Time series IV

External instruments in SVAR

Ch. 16

Bounds

When IV fails, bounds may still apply

Ch. 17

Triangulation Strategies

IV estimates should ideally be compared with:

  1. Other instruments: Do different sources of variation give similar answers?

  2. SOO with sensitivity: How much selection would be needed to explain the IV result?

  3. Experiments: When available, do experiments confirm the IV magnitude?

The returns to education literature illustrates this well: draft lottery, compulsory schooling, and twins studies all give broadly similar estimates, strengthening confidence in the finding.


Summary

Key takeaways:

  1. IV uses exogenous variation in an instrument to identify causal effects when confounding is unobserved

  2. Validity requires relevance (testable) and exclusion (assumed)

  3. Weak instruments bias IV toward OLS and distort inference

  4. With heterogeneous effects, IV estimates LATE---the effect for compliers

  5. Different instruments can give different answers because they identify different compliers

Returning to the opening question: Yes, we can learn about causal effects even with unobserved confounding---if we can find an external source of variation that shifts treatment without directly affecting outcomes. The challenge is finding such instruments and defending their validity.


Further Reading

Essential

  • Angrist, Imbens & Rubin (1996). "Identification of Causal Effects Using Instrumental Variables." JASA.

  • Imbens (2014). "Instrumental Variables: An Econometrician's Perspective."

For Deeper Understanding

  • Stock, Wright & Yogo (2002). "A Survey of Weak Instruments." Journal of Business & Economic Statistics.

  • Angrist & Krueger (2001). "Instrumental Variables and the Search for Identification." JEP.

Advanced/Specialized

  • Andrews, Stock & Sun (2019). "Weak Instruments in IV Regression: Theory and Practice."

  • Mogstad, Torgovitsky & Walters (2021). "The Causal Interpretation of Two-Stage Least Squares with Multiple Instruments."

Applications

  • Angrist (1990). "Lifetime Earnings and the Vietnam Era Draft Lottery." AER.

  • Card (1995). "Using Geographic Variation in College Proximity to Estimate the Return to Schooling."

  • Angrist & Krueger (1991). "Does Compulsory School Attendance Affect Schooling and Earnings?" QJE.


Exercises

Conceptual

  1. Explain in your own words why the exclusion restriction is necessary for IV to identify causal effects. Give an example where the exclusion restriction is likely violated.

  2. Two researchers use different instruments to estimate the effect of education on earnings. Researcher A gets β^=0.10\hat{\beta} = 0.10; Researcher B gets β^=0.05\hat{\beta} = 0.05. Both instruments appear valid. Can both estimates be correct? Explain.

Applied

  1. Using data from [Angrist data archive], replicate the draft lottery IV estimate. Report:

    • First stage results and F-statistic

    • 2SLS estimate and standard error

    • Interpretation: who are the compliers?

  2. Conduct a weak instrument sensitivity analysis. What happens to your estimates as you (artificially) weaken the first stage?

Discussion

  1. A researcher proposes using "distance to nearest casino" as an instrument for gambling behavior when studying gambling's effect on household finances. Evaluate this instrument: Is it relevant? Is the exclusion restriction plausible? What concerns would you raise?


Technical Appendix: Derivations

A.1 Derivation of the Wald Estimator

Starting from the structural equation Y=β0+β1D+εY = \beta_0 + \beta_1 D + \varepsilon, take the covariance with Z:

Cov(Z,Y)=Cov(Z,β0)+β1Cov(Z,D)+Cov(Z,ε)\text{Cov}(Z, Y) = \text{Cov}(Z, \beta_0) + \beta_1 \text{Cov}(Z, D) + \text{Cov}(Z, \varepsilon)

The first term is zero (covariance with constant). Assume Cov(Z,ε)=0\text{Cov}(Z, \varepsilon) = 0 (exclusion). Then:

Cov(Z,Y)=β1Cov(Z,D)\text{Cov}(Z, Y) = \beta_1 \text{Cov}(Z, D)

Solving:

β1=Cov(Z,Y)Cov(Z,D)\beta_1 = \frac{\text{Cov}(Z, Y)}{\text{Cov}(Z, D)}

A.2 Asymptotic Distribution

Under standard regularity conditions, the 2SLS estimator is asymptotically normal:

n(β^2SLSβ)dN(0,V)\sqrt{n}(\hat{\beta}_{2SLS} - \beta) \xrightarrow{d} N(0, V)

where V=σ2(ZX(XX)1XZ)1V = \sigma^2 (Z'X(X'X)^{-1}X'Z)^{-1} and XX is the matrix of instruments.


Draft version. Comments welcome.

Last updated