Chapter 12: Instrumental Variables

Opening Question

When you cannot randomly assign a treatment and observable confounders do not explain all the selection, can you still learn something about causal effects?

Chapter Overview

The methods we have encountered so far require either controlling the treatment assignment (experiments) or observing all relevant confounders (selection on observables). But what if neither condition holds? What if people select into treatment based on factors we cannot see or measure?

Instrumental variables (IV) offer an answer: find an external source of variation that affects treatment but has no direct effect on outcomes. This is the logic behind draft lotteries, compulsory schooling laws, and weather shocks. When such variation exists, we can use it to identify causal effects even in the presence of unobserved confounding.

This chapter develops the theory and practice of IV estimation. We begin with the core logic, move through estimation and inference, confront the complications of weak instruments and heterogeneous effects, and end with practical guidance on when IV is credible. The returns to education serves as our primary running example---a question that has driven methodological innovation for three decades.

What you will learn:

The logic of instrumental variables as a source of exogenous variation
How 2SLS estimation works and when it recovers causal effects
What LATE means and why it matters for interpretation
How to detect and handle weak instruments
When IV estimates are (and are not) credible

Prerequisites: Chapter 9 (The Causal Framework), Chapter 3 (Statistical Foundations)

12.1 The Logic of Instruments

Why We Need Instruments

Consider the classic question: what is the causal effect of an additional year of education on earnings?

If we simply regress log wages on years of schooling, we get a correlation. But this correlation likely overstates the causal effect. People who obtain more education may differ systematically from those who do not---in motivation, ability, family background, and countless other ways that also affect earnings. These unobserved factors confound the relationship between education and wages.

Chapter 11 addressed this problem by assuming we could observe and control for all relevant confounders. But what if we cannot measure ability? What if family background is only partially captured by parental education and income?

Instrumental variables offer an alternative path forward.

The Core Idea

The logic of IV is simple in principle:

Find a variable Z that affects the treatment D
Verify that Z affects the outcome Y only through its effect on D
Use the variation in D induced by Z to estimate the causal effect

The variable Z is called an instrument. It must satisfy two conditions:

Condition 1: Relevance The instrument must affect the treatment: $\text{Cov}(Z, D) \neq 0$

Condition 2: Exclusion The instrument must affect the outcome only through treatment: $\text{Cov}(Z, \varepsilon) = 0$ where $\varepsilon$ represents all factors affecting Y other than D

The relevance condition is testable. The exclusion restriction is not---it is an assumption about the world that requires substantive justification.

Example: Vietnam-Era Draft Lottery

Perhaps the most famous IV for education is the Vietnam-era draft lottery. In the early 1970s, the U.S. military drafted men based on randomly assigned lottery numbers. Men with low lottery numbers faced high probability of military service; men with high numbers faced almost none.

How does this help with returns to education? Men who received low lottery numbers often sought draft deferments---and one reliable deferment was college enrollment. So the draft lottery affected education. But the lottery number itself was randomly assigned, meaning it should not be correlated with ability, family background, or other confounders.

The logic of the IV strategy:

Z = Draft lottery number (or indicator for "high draft risk")
D = Years of education
Y = Log earnings

The draft lottery provides exogenous variation in education. By comparing outcomes for men with different lottery numbers, we can estimate how education causally affects earnings---without needing to observe ability or other confounders.

Running Example: Returns to Education
The effect of education on earnings is our primary running example for this chapter. We will see the draft lottery IV (Angrist 1990), the compulsory schooling IV (Angrist & Krueger 1991), and the geographic proximity IV (Card 1995). Each illustrates different aspects of IV methodology. By chapter's end, you will understand both the power and the limitations of IV for answering this fundamental question.

12.2 Formal Framework

The Structural Equations

Consider the standard setup with a single endogenous regressor:

$Y_i = \beta_0 + \beta_1 D_i + \varepsilon_i$

where:

$Y_i$ is the outcome
$D_i$ is the treatment (endogenous: $\text{Cov}(D, \varepsilon) \neq 0$ )
$\beta_1$ is the causal effect we want to estimate
$\varepsilon_i$ captures unobserved factors affecting Y

The problem is that $D$ is correlated with $\varepsilon$ . People with high $\varepsilon$ (high ability, say) tend to have high $D$ (more education). OLS conflates the causal effect $\beta_1$ with the correlation between $D$ and $\varepsilon$ .

Now introduce an instrument $Z$ satisfying:

Assumption 12.1 (Relevance): $\text{Cov}(Z, D) \neq 0$

Assumption 12.2 (Exclusion): $\text{Cov}(Z, \varepsilon) = 0$

The Wald Estimator

Under these assumptions, we can derive the IV estimator. The key insight is that:

$\text{Cov}(Z, Y) = \text{Cov}(Z, \beta_0 + \beta_1 D + \varepsilon) = \beta_1 \cdot \text{Cov}(Z, D) + \text{Cov}(Z, \varepsilon)$

If the exclusion restriction holds ( $\text{Cov}(Z, \varepsilon) = 0$ ), then:

$\text{Cov}(Z, Y) = \beta_1 \cdot \text{Cov}(Z, D)$

Solving for $\beta_1$ :

$\beta_1 = \frac{\text{Cov}(Z, Y)}{\text{Cov}(Z, D)}$

This is the Wald estimator when Z is binary. It equals the ratio of:

The reduced form: how Z affects Y
The first stage: how Z affects D

Interpretation: The effect of Z on Y comes entirely through D. Divide by how much Z moves D to get how much D moves Y.

Two-Stage Least Squares (2SLS)

The Wald estimator extends naturally to multiple instruments and continuous instruments via two-stage least squares:

First stage: Regress D on Z (and any exogenous controls X): $D_i = \pi_0 + \pi_1 Z_i + \pi_2' X_i + \nu_i$

Second stage: Regress Y on the fitted values $\hat{D}$ (and controls X): $Y_i = \beta_0 + \beta_1 \hat{D}_i + \beta_2' X_i + u_i$

The coefficient $\hat{\beta}_1$ from the second stage is the 2SLS estimator.

Intuition: The first stage isolates the variation in D that comes from Z. The second stage uses only this "clean" variation to estimate the effect on Y. Variation in D coming from $\varepsilon$ is stripped away.

What Could Go Wrong?

The two assumptions---relevance and exclusion---may fail:

Weak instruments: If $Z$ only weakly predicts $D$ , the first stage is weak. This creates several problems:

Large sampling variance
Bias toward OLS
Unreliable standard errors

We address weak instruments in Section 12.4.

Exclusion violation: If $Z$ directly affects $Y$ ---or affects $Y$ through some channel other than $D$ ---the IV estimate is biased. Unlike weak instruments, exclusion violations cannot be detected from the data. They require substantive argument.

12.3 Estimation and Inference

Implementing 2SLS

Standard software makes 2SLS easy. In Stata:

ivregress 2sls wage (education = draft_lottery) age age2 race, robust

In R:

library(ivreg)
model <- ivreg(wage ~ education + age + age2 + race |
               draft_lottery + age + age2 + race, data = df)

Important: Always use the built-in IV commands. Do not manually run two regressions---this produces incorrect standard errors.

First Stage Diagnostics

Before trusting IV estimates, examine the first stage:

F-statistic on excluded instruments: A rule of thumb is F > 10 (Stock, Wright & Yogo 2002). Modern practice often demands F > 100 for robust inference.
First stage coefficient: Is the sign correct? Is the magnitude plausible?
Visual inspection: Plot $D$ against $Z$ . Is there a clear relationship?

Practical Box: First Stage Checklist
Report the first stage equation, not just the second stage
Report the F-statistic on excluded instruments
Confirm the first stage has the expected sign and magnitude
If F < 10, report weak-instrument-robust confidence intervals

Standard Errors

With a valid instrument and large samples, the 2SLS standard errors are consistent. However:

Cluster your standard errors if the instrument varies at a group level (e.g., state policy, lottery cohort)
Use robust standard errors by default to handle heteroskedasticity
With weak instruments, standard errors may be misleading---see Section 12.4

The Overidentification Test (J-Test)

With more instruments than endogenous variables (overidentification), we can partially test instrument validity. The Sargan-Hansen J-test checks whether all instruments give the same answer.

The idea: If all instruments are valid, they should all point to the same $\beta$ . The J-test checks whether the instruments disagree more than sampling variation would explain.

Implementation:

ivregress 2sls wage (education = instrument1 instrument2 instrument3), robust
estat overid   * Reports Hansen's J statistic

Interpretation: Under the null (all instruments valid), the J-statistic is $\chi^2$ with degrees of freedom equal to the number of overidentifying restrictions (number of instruments minus number of endogenous variables). Rejection suggests at least one instrument is invalid.

Box: Why the J-Test Has Limited Value
The overidentification test sounds appealing—a way to test the untestable exclusion restriction. But it has severe limitations:
1. It tests consistency, not validity If all instruments are invalid in the same direction, they will "agree" and the J-test will pass. Example: If both father's and mother's education affect child's earnings directly (not just through child's education), but both biases are upward, the J-test won't detect the problem.
2. Rejection is ambiguous If the J-test rejects, you know something is wrong—but not which instrument. With three instruments, one, two, or all three could be invalid.
3. Non-rejection proves nothing Passing the J-test does not mean your instruments are valid. It only means they're consistent with each other.
4. Power is often low The test may fail to reject even when instruments are moderately invalid, especially with weak instruments.
Practical guidance: Report the J-test if you have multiple instruments, but don't treat non-rejection as validation. The J-test is necessary but far from sufficient for credibility. The real work is defending exclusion substantively.

12.4 Weak Instruments

The Problem

An instrument is "weak" if it explains little of the variation in D. Formally, the first-stage F-statistic is small.

Weak instruments cause three problems:

Bias: The IV estimator is biased toward OLS in finite samples
Variance: Standard errors become large and unstable
Inference: t-tests and confidence intervals may be misleading

The severity depends on how weak the instrument is. Stock, Wright & Yogo (2002) showed that F = 10 is approximately the threshold below which conventional inference becomes unreliable.

Detection

Test for weak instruments using the first-stage F-statistic:

F-statistic

Interpretation

F > 100

Strong instrument

20 < F < 100

Probably adequate

10 < F < 20

Borderline; consider robust inference

F < 10

Weak; do not trust standard 2SLS

Robust Inference

When instruments may be weak, use weak-instrument-robust methods:

Anderson-Rubin test: Tests the null $\beta_1 = \beta_0$ for any hypothesized value. Invert to get confidence intervals. Valid regardless of instrument strength.
Conditional likelihood ratio (CLR): More efficient than AR with multiple instruments.
tF adjustment: Lee et al. (2022) propose adjusting critical values based on first-stage F.

In Stata:

weakiv 2sls wage (education = draft_lottery) age, robust

Common Pitfall: The "Just Significant" First Stage
Some researchers accept any statistically significant first stage as adequate. This is wrong. An instrument can be significant but still weak. Focus on the F-statistic, not the p-value.
How to avoid: Report F-statistics. If F < 20, seriously consider weak-instrument-robust methods.

Alternative Estimators: LIML and Fuller

With weak instruments, 2SLS is biased toward OLS. Alternative estimators can reduce this bias:

Limited Information Maximum Likelihood (LIML)

LIML is an alternative to 2SLS that is median-unbiased—its median equals the true parameter even with weak instruments. The bias of 2SLS is proportional to the number of instruments; LIML's bias does not depend on the number of instruments.

$\hat{\beta}_{LIML} = (X'Z(Z'Z)^{-1}Z'X - \hat{\kappa} X'M_Z X)^{-1}(X'Z(Z'Z)^{-1}Z'Y - \hat{\kappa} X'M_Z Y)$

where $\hat{\kappa}$ is the smallest root of a generalized eigenvalue problem.

Intuition: LIML can be understood as 2SLS applied to a transformed model where the first-stage residual variance is scaled appropriately. This scaling corrects the finite-sample bias.

* LIML estimation
ivregress liml wage (education = instrument1 instrument2), robust

* Compare with 2SLS
ivregress 2sls wage (education = instrument1 instrument2), robust

# In R with AER package
library(AER)
liml_fit <- ivreg(wage ~ education | instrument1 + instrument2,
                  method = "LIML", data = df)

Fuller's Modified LIML

Fuller (1977) proposed a modification that reduces LIML's variance at the cost of slightly more bias. The Fuller estimator replaces $\hat{\kappa}$ with $\hat{\kappa} - c/(n - K)$ where $c = 1$ or $c = 4$ are common choices.

Fuller(1): Less biased than 2SLS, lower variance than LIML
Fuller(4): Even lower variance, slightly more bias

Estimator

Bias

Variance

When to use

2SLS

Toward OLS with weak instruments

Lowest (if F > 100)

Strong instruments only

LIML

Median-unbiased

Higher than 2SLS

Weak instruments, inference focus

Fuller(1)

Less than 2SLS

Between 2SLS and LIML

Weak instruments, MSE focus

Practical guidance: When the first-stage F-statistic is between 10 and 50, report both 2SLS and LIML. If they differ substantially, the instruments are likely too weak for reliable inference under any method.

12.5 LATE: What Are We Actually Estimating?

Heterogeneous Treatment Effects

So far we assumed a constant effect $\beta_1$ . But treatment effects may vary across people. Some people gain a lot from education; others gain little.

When effects are heterogeneous, what does IV estimate?

Compliers, Always-Takers, Never-Takers

Consider a binary instrument $Z$ and binary treatment $D$ . Define four groups:

Type

D when Z=0

D when Z=1

Description

Compliers

Take treatment only when induced by Z

Always-takers

Always take treatment

Never-takers

Never take treatment

Defiers

Do the opposite of Z

The draft lottery example:

Compliers: Men who enrolled in college because of draft risk
Always-takers: Men who would have attended college regardless
Never-takers: Men who would not have attended college regardless
Defiers: Men who dropped out because of draft risk (assumed rare)

Box: Why Monotonicity Matters—The Problem of Defiers
The monotonicity assumption ( $D_i(1) \geq D_i(0)$ for all $i$ ) rules out defiers. This assumption often receives less attention than exclusion, but violations can be equally devastating.
What goes wrong with defiers: The Wald estimator divides the reduced-form effect by the first stage: $\hat{\beta}_{IV} = \frac{E[Y|Z=1] - E[Y|Z=0]}{E[D|Z=1] - E[D|Z=0]}$
With defiers present, the denominator includes offsetting effects:
Compliers: $D$ goes from 0 to 1 when $Z=1$ (positive contribution)
Defiers: $D$ goes from 1 to 0 when $Z=1$ (negative contribution)
The first stage is now the net effect. If compliers and defiers have different treatment effects, IV estimates a weighted average where defiers receive negative weight. The result can fall outside the range of any individual's treatment effect.
Example: Suppose a job training instrument encourages most workers to enroll (compliers), but makes a few workers suspicious and refuse (defiers). If defiers are high-ability workers who would have benefited most from training, the IV estimate will be biased downward—it subtracts the defiers' large effects from the compliers' smaller effects.
When to worry:
Instruments that induce both positive and negative responses in subgroups
Policies with both "nudge" and "reactance" effects
Price changes where some consumers increase and others decrease consumption
What to do:
Argue substantively that defiers are implausible or rare
Look for subgroups where defiers might exist and test for sign reversals
Consider partial identification approaches that allow for defiers (Huber and Mellace 2015)

The Local Average Treatment Effect

Theorem 12.1 (LATE; Imbens & Angrist 1994)
Under the assumptions of relevance, exclusion, and monotonicity (no defiers), IV identifies:
$\beta_{IV} = E[Y_1 - Y_0 | \text{Compliers}]$
The local average treatment effect---the average effect for those whose treatment status is changed by the instrument.

Intuition: IV uses variation induced by $Z$ . This variation only affects compliers. Always-takers and never-takers contribute no variation. So IV estimates the effect for compliers only.

Implications

LATE has profound implications:

External validity: IV estimates apply to compliers, who may not be representative of the population. Draft lottery compliers were probably marginal college-goers. Their returns may differ from average returns.
Different instruments, different estimates: Two valid instruments can give different estimates if they identify different groups of compliers. This is not a contradiction---it is a feature.
Policy relevance: LATE may or may not be policy-relevant, depending on whether the policy affects the same people as the instrument.

Running Example: Who Are the Compliers?
In the draft lottery study, compliers were men whose college enrollment depended on draft risk. These were likely:
Men from families where college was possible but not certain
Men for whom military service was particularly unappealing
Their returns to education may exceed the population average if marginal college-goers benefit more from the credential signal.

12.6 Dose-Response with IV

Beyond Binary Treatment

Many treatments are continuous: years of education, dosage of medication, hours of training. Can IV handle continuous treatments?

Yes, with additional assumptions. The standard approach:

Assume a linear relationship between D and Y
Use IV to estimate the slope

This gives the effect of a one-unit increase in treatment, averaged across whatever shifts the instrument induces.

Control Function Approach

An alternative is the control function approach:

First stage: $D = \pi_0 + \pi_1 Z + \nu$
Outcome equation: $Y = \beta_0 + \beta_1 D + \rho \hat{\nu} + u$

The residual $\hat{\nu}$ captures the endogenous variation in D. Including it as a control removes the bias. The coefficient $\beta_1$ on D gives the causal effect.

Advantages:

Clearer about what endogeneity you are correcting
Easier to test for endogeneity ( $\rho = 0$ ?)
More flexible for nonlinear first stages

Shift-share instruments have become one of the most widely used identification strategies in applied economics, particularly in labor, trade, and migration research. Understanding their structure and the ongoing debates about their validity is essential for modern empirical work.

The Basic Structure

A shift-share instrument combines:

Shares: Pre-determined exposure weights (e.g., initial industry employment shares in a region)
Shocks: Aggregate changes (e.g., national industry growth rates)

The instrument for region $r$ at time $t$ is:

$Z_{rt} = \sum_{k} s_{rk,t_0} \times g_{kt}$

where:

$s_{rk,t_0}$ = region $r$ 's share of industry $k$ at baseline $t_0$
$g_{kt}$ = national (leave-one-out) growth in industry $k$ at time $t$

The Classic Application: Bartik (1991)

Timothy Bartik studied how local labor demand affects wages and employment. The challenge: local labor demand is endogenous to local wages.

Solution: Construct predicted labor demand growth by interacting:

Each region's initial industry composition (shares)
National industry growth rates (shocks)

Regions with more manufacturing are predicted to grow faster when national manufacturing booms—not because of anything special about that region, but because of its pre-determined exposure to national trends.

Two Views on Identification

A major debate concerns the source of identifying variation:

1. Exogenous Shares (Goldsmith-Pinkham, Sorkin & Swift 2020)

Identification comes from the shares being uncorrelated with unobserved regional characteristics. The estimator is equivalent to a GMM estimator using each industry share as a separate instrument.

Assumption: Baseline shares are as good as randomly assigned
Test: Check balance of shares against pre-trends and observables
Best suited for: Settings where initial specialization patterns are plausibly exogenous

2. Exogenous Shocks (Borusyak, Hull & Jaravel 2022)

Identification comes from the shocks being exogenous—the national industry trends are independent of region-specific factors.

Assumption: Shocks are as good as randomly assigned across industries
Test: Check balance of shocks; clustering at shock level
Best suited for: Settings with many quasi-random shocks (e.g., trade shocks from policy changes)

Practical Implementation

* Generate shift-share instrument
gen bartik = 0
forvalues k = 1/K {
    gen shock_k = industry_growth_k  // National growth, leave-one-out
    replace bartik = bartik + share_k * shock_k
}

* IV regression
ivregress 2sls outcome controls (endogenous = bartik), robust

Key Considerations

Inference: Standard errors must reflect the structure:

If relying on shock exogeneity: cluster at the shock level (industry)
If relying on share exogeneity: cluster at the unit level (region)
Exposure-robust standard errors (Adão, Kolesár & Morales 2019) are often appropriate

Leave-one-out: Always construct national shocks excluding the focal region to avoid mechanical correlation.

Many weak shocks: With many small shocks, aggregation helps. With few dominant shocks, the instrument may be weak.

Examples in the Literature

Paper

Shocks

Finding

Autor, Dorn & Hanson (2013)

Initial industry shares

Chinese import growth

China trade shock reduced manufacturing employment

Card (2001)

Initial immigrant shares

Immigrant inflows by origin

Immigration affects local wages

Nakamura & Steinsson (2014)

Regional military spending shares

National defense spending

Fiscal multiplier estimation

Condition

Assessment

Clear shock source (policy, trade, technology)

Favors shock-based identification

Pre-period shares are quasi-random

Favors share-based identification

Many small industries/shocks

Aggregation provides power

Few dominant shocks

May have weak instrument problems

Shares and shocks both questionable

Consider alternative strategies

Warning: Shift-share instruments are not a "free lunch." The identifying assumptions—whether on shares or shocks—must be defended. The popularity of this approach has led to applications where neither shares nor shocks are plausibly exogenous.

Practical Guidance

When to Use IV

Situation

Use IV?

Notes

Strong, clearly exogenous instrument available

Yes

The ideal case

Instrument's exclusion is debatable

Maybe

Depends on quality of argument

First stage F < 10

Caution

Use robust methods, report wide bounds

Multiple weak instruments available

Maybe

Consider LIML or regularization

No plausible instrument exists

Do not invent one; consider bounds

Common Pitfalls

Pitfall 1: Assuming Any Correlation Is Causal
The fact that Z predicts D does not make Z a valid instrument. Many predictors are themselves endogenous.
How to avoid: Articulate why Z is exogenous. What is the source of randomness?

Pitfall 2: Ignoring the Exclusion Restriction
The exclusion restriction cannot be tested. Researchers sometimes treat it as automatically satisfied because Z is "random."
How to avoid: Think hard about channels. Could Z affect Y directly? Through other variables?

Pitfall 3: Overinterpreting LATE
LATE applies to compliers, not the population. Policy effects may differ.
How to avoid: Characterize compliers when possible. Discuss external validity.

Implementation Checklist

State the instrument clearly
Explain why the instrument is relevant (economic mechanism)
Defend the exclusion restriction (what channels are ruled out?)
Report first-stage results with F-statistic
If F < 20, report weak-instrument-robust inference
Discuss who the compliers are
Consider sensitivity to exclusion violation

Qualitative Bridge

How Qualitative Methods Complement IV

IV provides a point estimate for compliers---but leaves much unknown:

Who are the compliers? IV identifies a causal effect for an unknown subgroup. Qualitative research can characterize this group through interviews or case studies.
Is the exclusion restriction plausible? The exclusion restriction is an assumption about mechanisms. Qualitative investigation of how the instrument operates can strengthen or weaken the case.
What mechanisms drive the effect? IV tells us that D causes Y, not how. Process tracing and qualitative case studies can illuminate mechanisms.

Example: Understanding Draft Lottery Compliers

Card and Lemieux (2001) complemented the draft lottery IV with detailed investigation of who was affected by draft risk. They examined:

Enrollment patterns by socioeconomic status
Timing of enrollment decisions
Subsequent labor market behavior

This qualitative work helped interpret what the IV estimate meant and for whom it was relevant.

Integration Note

Connections to Other Methods

Method

Relationship

See Chapter

Selection on observables

IV handles unobserved confounding; SOO requires observed confounders

Ch. 11

RDD (fuzzy)

Fuzzy RDD is IV at a threshold

Ch. 14

Time series IV

External instruments in SVAR

Ch. 16

Bounds

When IV fails, bounds may still apply

Ch. 17

Triangulation Strategies

IV estimates should ideally be compared with:

Other instruments: Do different sources of variation give similar answers?
SOO with sensitivity: How much selection would be needed to explain the IV result?
Experiments: When available, do experiments confirm the IV magnitude?

The returns to education literature illustrates this well: draft lottery, compulsory schooling, and twins studies all give broadly similar estimates, strengthening confidence in the finding.

Summary

Key takeaways:

IV uses exogenous variation in an instrument to identify causal effects when confounding is unobserved
Validity requires relevance (testable) and exclusion (assumed)
Weak instruments bias IV toward OLS and distort inference
With heterogeneous effects, IV estimates LATE---the effect for compliers
Different instruments can give different answers because they identify different compliers

Returning to the opening question: Yes, we can learn about causal effects even with unobserved confounding---if we can find an external source of variation that shifts treatment without directly affecting outcomes. The challenge is finding such instruments and defending their validity.

Exercises

Conceptual

Explain in your own words why the exclusion restriction is necessary for IV to identify causal effects. Give an example where the exclusion restriction is likely violated.
Two researchers use different instruments to estimate the effect of education on earnings. Researcher A gets $\hat{\beta} = 0.10$ ; Researcher B gets $\hat{\beta} = 0.05$ . Both instruments appear valid. Can both estimates be correct? Explain.

Applied

Using data from [Angrist data archive], replicate the draft lottery IV estimate. Report:
- First stage results and F-statistic
- 2SLS estimate and standard error
- Interpretation: who are the compliers?
Conduct a weak instrument sensitivity analysis. What happens to your estimates as you (artificially) weaken the first stage?

Discussion

A researcher proposes using "distance to nearest casino" as an instrument for gambling behavior when studying gambling's effect on household finances. Evaluate this instrument: Is it relevant? Is the exclusion restriction plausible? What concerns would you raise?

Technical Appendix: Derivations

A.1 Derivation of the Wald Estimator

Starting from the structural equation $Y = \beta_0 + \beta_1 D + \varepsilon$ , take the covariance with Z:

$\text{Cov}(Z, Y) = \text{Cov}(Z, \beta_0) + \beta_1 \text{Cov}(Z, D) + \text{Cov}(Z, \varepsilon)$

The first term is zero (covariance with constant). Assume $\text{Cov}(Z, \varepsilon) = 0$ (exclusion). Then:

$\text{Cov}(Z, Y) = \beta_1 \text{Cov}(Z, D)$

Solving:

$\beta_1 = \frac{\text{Cov}(Z, Y)}{\text{Cov}(Z, D)}$

A.2 Asymptotic Distribution

Under standard regularity conditions, the 2SLS estimator is asymptotically normal:

$\sqrt{n}(\hat{\beta}_{2SLS} - \beta) \xrightarrow{d} N(0, V)$

where $V = \sigma^2 (Z'X(X'X)^{-1}X'Z)^{-1}$ and $X$ is the matrix of instruments.

Draft version. Comments welcome.

PreviousChapter 11: Selection on Observables NextChapter 13: Difference-in-Differences

Last updated 4 days ago

hashtagOpening Question

hashtagChapter Overview

hashtag12.1 The Logic of Instruments

hashtagWhy We Need Instruments

hashtagThe Core Idea

hashtagExample: Vietnam-Era Draft Lottery

hashtag12.2 Formal Framework

hashtagThe Structural Equations

hashtagThe Wald Estimator

hashtagTwo-Stage Least Squares (2SLS)

hashtagWhat Could Go Wrong?

hashtag12.3 Estimation and Inference

hashtagImplementing 2SLS

hashtagFirst Stage Diagnostics

hashtagStandard Errors

hashtagThe Overidentification Test (J-Test)

hashtag12.4 Weak Instruments

hashtagThe Problem

hashtagDetection

hashtagRobust Inference

hashtagAlternative Estimators: LIML and Fuller

hashtag12.5 LATE: What Are We Actually Estimating?

hashtagHeterogeneous Treatment Effects

hashtagCompliers, Always-Takers, Never-Takers

hashtagThe Local Average Treatment Effect

hashtagImplications

hashtag12.6 Dose-Response with IV

hashtagBeyond Binary Treatment

hashtagControl Function Approach

hashtag12.7 Shift-Share (Bartik) Instruments

hashtagThe Basic Structure

hashtagThe Classic Application: Bartik (1991)

hashtagTwo Views on Identification

hashtagPractical Implementation

hashtagKey Considerations

hashtagExamples in the Literature

hashtagWhen to Use Shift-Share

hashtagPractical Guidance

hashtagWhen to Use IV

hashtagCommon Pitfalls

hashtagImplementation Checklist

hashtagQualitative Bridge

hashtagHow Qualitative Methods Complement IV

hashtagExample: Understanding Draft Lottery Compliers

hashtagIntegration Note

hashtagConnections to Other Methods

hashtagTriangulation Strategies

hashtagSummary

hashtagFurther Reading

hashtagEssential

hashtagFor Deeper Understanding

hashtagAdvanced/Specialized

hashtagApplications

hashtagExercises

hashtagConceptual

hashtagApplied

hashtagDiscussion

hashtagTechnical Appendix: Derivations

hashtagA.1 Derivation of the Wald Estimator

hashtagA.2 Asymptotic Distribution