Chapter 14: Regression Discontinuity
Opening Question
When treatment is assigned by crossing a threshold, can we recover causal effects by comparing units just above and just below the cutoff?
Chapter Overview
Regression discontinuity (RD) designs exploit situations where treatment is determined, wholly or partly, by whether a continuous variable crosses a threshold. Students receive scholarships if their test scores exceed a cutoff. Politicians win elections if their vote share exceeds 50%. Policies take effect when population crosses a boundary. At these thresholds, units just above and just below are nearly identical in expectation—as if treatment were randomly assigned among units near the cutoff.
This "local randomization" intuition makes RD one of the most credible quasi-experimental designs. When the assignment variable cannot be precisely manipulated, crossing the threshold is effectively random for units near the cutoff. RD combines the credibility of randomized experiments (for units near the cutoff) with the observational data advantages of using naturally occurring variation.
This chapter develops the RD framework, distinguishing sharp designs (treatment perfectly determined by the cutoff) from fuzzy designs (cutoff induces variation in treatment probability). We cover estimation, bandwidth selection, validity diagnostics, and the interpretation challenges that arise in practice.
What you will learn:
The local randomization logic underlying RD designs
How to estimate RD effects using local polynomial regression
Bandwidth selection: the bias-variance tradeoff
Distinguishing sharp vs. fuzzy RD and estimating each
Validity diagnostics: testing for manipulation and covariate balance
How to present RD evidence convincingly
Prerequisites: Chapter 9 (Causal Framework), Chapter 12 (IV, for fuzzy RD), Chapter 3 (Statistical Foundations)
14.1 The Sharp RD Design
Setup and Notation
Let Xi be a continuous running variable (also called the assignment or forcing variable) and c be the cutoff. In a sharp RD design, treatment is a deterministic function of the running variable:
Di=1[Xi≥c]
Units with Xi≥c are treated; units with Xi<c are controls. There is no ambiguity, no exceptions, no discretion.
Examples of sharp RD:
Test score thresholds for program eligibility
Age cutoffs for policy exposure (drinking age, voting age, retirement)
Date cutoffs (fiscal year boundaries, policy implementation dates)
Geographic boundaries (different regulations on either side of a border)
The Local Randomization Intuition
Why does RD work? Consider units with Xi very close to the cutoff—say, within a narrow bandwidth h of c. For these units:
Balance: Any pre-treatment characteristic that varies smoothly with X will be nearly identical for units just above and just below c
As-if random: The small differences in X that determine treatment status are effectively random
This intuition is formalized through continuity assumptions.
Assumption 14.1 (Continuity of Potential Outcomes): The conditional expectation functions of potential outcomes are continuous in the running variable at the cutoff: limx↓cE[Yi(1)∣Xi=x]=E[Yi(1)∣Xi=c] limx↑cE[Yi(0)∣Xi=x]=E[Yi(0)∣Xi=c]
Intuition: There is no discontinuity in how outcomes would evolve with X absent the treatment threshold. Any jump in outcomes at c must be due to treatment.
Box: Two Frameworks for RD—Continuity vs. Local Randomization
The RD literature has developed two distinct conceptual frameworks. Understanding the difference clarifies what RD assumes and when it works.
Continuity-Based Framework (Hahn, Todd, Van der Klaauw 2001)
The classical approach assumes potential outcomes are continuous functions of the running variable at the cutoff. Identification comes from comparing limits: τRD=limx↓cE[Y∣X=x]−limx↑cE[Y∣X=x]
Key assumption: Smoothness of potential outcomes at c
Estimation: Local polynomial regression with carefully chosen bandwidth
Inference: Asymptotic approximations based on boundary kernel estimation
Covariates: Not required for identification; can improve precision
Local Randomization Framework (Cattaneo, Frandsen, Titiunik 2015)
This alternative assumes that in a small window around the cutoff, assignment is as-if randomly assigned—like a local experiment: Di⊥(Yi(0),Yi(1))for Xi∈[c−w,c+w]
Key assumption: Local random assignment in window w
Estimation: Difference in means (or randomization inference)
Inference: Fisher-style permutation tests
Covariates: Can test local balance (like an RCT)
When Does Each Apply?
Running variable precisely determined
Continuity (local randomization may fail)
Running variable has measurement error or noise
Local randomization may apply
Large sample, smooth relationship
Continuity (polynomial approximation works)
Small sample, discrete running variable
Local randomization (window-based inference)
Manipulation concerns
Both require no manipulation
Practical implication: Most RD analyses use the continuity framework and local polynomial methods. But the local randomization interpretation provides clearer intuition and suggests different diagnostics (testing covariate balance within the window, as in an RCT). The two frameworks often give similar answers when the running variable has sufficient noise near the cutoff—which is precisely when RD is most credible.
The RD Estimand
Under the continuity assumption, the RD identifies a local average treatment effect at the cutoff:
τRD=E[Yi(1)−Yi(0)∣Xi=c]
This is the treatment effect specifically for units at the threshold—those on the margin of treatment.
Comparison to other designs:
RCT: ATE for experimental sample
IV: LATE for compliers
RD: Effect at the cutoff
The RD estimand is highly local. This is both a strength (internal validity is very credible) and a limitation (effects may not generalize away from the cutoff).
Graphical Analysis
Before any formal estimation, RD demands graphical analysis. A well-constructed RD plot shows:
Binned means: Divide X into bins and plot average Y in each bin
The cutoff: Clear vertical line at c
Fitted curves: Local polynomial fits on each side of the cutoff
The discontinuity: Visual jump (or lack thereof) at the cutoff
A clear visual discontinuity is the first line of evidence. If the jump is not visible in the raw data, sophisticated estimation won't rescue the analysis.

Example: Electoral RD (Lee 2008)
David Lee's (2008) study of U.S. House elections is the paradigmatic RD application.
Setting: In U.S. House elections, the Democrat wins if their vote share exceeds 50%. This creates a sharp discontinuity.
Question: What is the effect of winning a House election on future electoral success? (Incumbency advantage)
Running variable: Democratic vote share margin (vote share minus 50%)
Xi>0: Democrat won
Xi<0: Democrat lost
Key insight: In very close elections (say, decided by less than 1 percentage point), the winner is essentially random. Which candidate had slightly more votes on election night is not systematically related to candidate quality, district preferences, or other confounders.
Findings: Lee finds a large discontinuity—Democratic candidates who barely win are about 40 percentage points more likely to win the next election compared to Democrats who barely lose. This incumbency advantage reflects both the benefits of holding office (name recognition, constituency service, campaign finance) and the selection of strong candidates into safe seats.
14.2 Estimation
Local Polynomial Regression
The standard approach estimates the conditional expectation function separately on each side of the cutoff using local polynomial regression.
Linear RD estimator: Fit linear functions on each side:
Yi=αl+βl(Xi−c)+εifor Xi<c Yi=αr+βr(Xi−c)+εifor Xi≥c
The RD estimate is τ^RD=α^r−α^l.
Why local? We only use observations within a bandwidth h of the cutoff: Xi∈[c−h,c+h].
Why polynomial? To approximate potentially curved conditional expectation functions near the cutoff.
Regression formulation: Equivalently, we can estimate:
Yi=α+τDi+β1(Xi−c)+β2Di⋅(Xi−c)+εi
for observations with ∣Xi−c∣≤h, where τ^ is the RD estimate.
Polynomial Order Choice
Linear (p = 1): Usually preferred. Linear approximation works well near the cutoff; higher-order terms add variance without improving bias much.
Quadratic or higher (p ≥ 2): Sometimes used for robustness checks. Higher-order polynomials can fit curved relationships better but are more sensitive to observations far from the cutoff.
Pitfall: Global Polynomial Fitting Fitting high-order polynomials (p = 3, 4, 5...) to all the data is dangerous. Such fits can chase noise far from the cutoff and produce wildly inaccurate estimates at c. Gelman and Imbens (2019) recommend against polynomials higher than quadratic.
How to avoid: Stick to local linear or local quadratic estimation with appropriate bandwidth.
Bandwidth Selection
The bandwidth h governs the bias-variance tradeoff:
Small h: Lower bias (using only observations very similar to cutoff), but higher variance (fewer observations)
Large h: Lower variance (more observations), but higher bias (observations far from cutoff less relevant)
MSE-optimal bandwidth: Imbens and Kalyanaraman (2012) and Calonico, Cattaneo, and Titiunik (2014) derive bandwidth selectors that minimize mean squared error (MSE):
hMSE=C⋅n−1/5
where C depends on curvature of the conditional expectation and density of X.
Robust inference: Using the MSE-optimal bandwidth for both estimation and inference understates uncertainty (because the bandwidth was chosen using the data). Calonico, Cattaneo, and Titiunik (2014) provide bias-corrected estimates and robust confidence intervals.

Implementation: The rdrobust package (R and Stata) automates:
MSE-optimal bandwidth selection
Bias-corrected estimation
Robust confidence intervals
Covariates in RD
Including pre-treatment covariates can improve precision but raises issues:
When covariates help:
If covariates predict outcomes strongly, including them reduces residual variance
In fuzzy RD, covariates can strengthen the first stage
How to include covariates:
Add covariates linearly to the RD specification
Or use covariates to residualize Y first, then run RD on residuals
Important: Only include pre-determined covariates—variables that could not be affected by treatment.
Caution: If RD is valid, covariates should be balanced at the cutoff. If including covariates changes estimates substantially, this suggests the covariates are discontinuous—which indicates a problem with the design, not something to "control for."
14.3 Fuzzy Regression Discontinuity
Setup
In many applications, the cutoff does not perfectly determine treatment. Instead, crossing the threshold increases the probability of treatment:
limx↓cP(Di=1∣Xi=x)−limx↑cP(Di=1∣Xi=x)>0
but this probability is not 1 vs. 0.
Examples of fuzzy RD:
Scholarship eligibility thresholds where not all eligible students accept
Age thresholds for drinking where enforcement is imperfect
Income thresholds for program eligibility with measurement error in income
The IV Interpretation
Fuzzy RD has an instrumental variables interpretation. The threshold acts as an instrument:
Relevance: Crossing the threshold affects treatment probability (first stage)
Exclusion: The threshold affects outcomes only through treatment (no direct effect of being just above vs. just below the cutoff except via treatment)
The fuzzy RD estimand is a local average treatment effect (LATE) for compliers at the cutoff—units who are treated when just above but not when just below.
Box: The "Doubly Local" Nature of Fuzzy RD
Fuzzy RD estimates are local in two ways:
1. Local to the cutoff (like all RD) We identify effects only for units near the threshold. With test score cutoffs, we learn about students scoring around the cutoff—not high achievers or struggling students far from the threshold.
2. Local to compliers (like all IV) Among units at the cutoff, we identify effects only for compliers—those whose treatment status is changed by crossing the threshold. Always-takers (treated regardless of side) and never-takers (untreated regardless) contribute nothing to identification.
Who are the compliers?
Above cutoff: treated
Below cutoff: would not have been treated
In the drinking age example: compliers are individuals who drink legally at 21 but would not have drunk (or drunk less) at 20. Those who drink heavily regardless of legal status (always-takers) or abstain regardless (never-takers) don't inform the estimate.
Interpretation caution: The fuzzy RD effect applies to a very specific population—compliers at the margin. This may be a small and unusual group. A college admissions fuzzy RD identifies effects for marginal admits who just qualified—not typical students, and not those whose admission was determined by other factors (legacy, athletic recruitment).
Quantifying the complier population: The first-stage jump tells you the complier share. If treatment probability jumps from 40% to 70% at the cutoff, compliers are 30% of the population at the threshold. The smaller this share, the more specialized your estimand.
Estimation
Wald estimator: The fuzzy RD estimate is the ratio of the outcome discontinuity to the treatment discontinuity:
τ^FRD=limx↓cE[D∣X=x]−limx↑cE[D∣X=x]limx↓cE[Y∣X=x]−limx↑cE[Y∣X=x]
This is exactly the 2SLS estimator using the threshold as an instrument for treatment.
Implementation: Two approaches:
Separate RD estimates: Estimate sharp RD for Y (reduced form) and sharp RD for D (first stage), then divide
2SLS: Run 2SLS with 1[X≥c] as instrument for D, restricting to observations near the cutoff
Example: Drinking Age and Mortality (Carpenter and Dobkin)
Carpenter and Dobkin (2009) study whether legal access to alcohol increases mortality.
Setting: In the U.S., individuals can legally purchase alcohol at age 21. This creates a fuzzy discontinuity—some under-21s drink illegally, and reaching 21 doesn't make everyone drink.
Running variable: Age (centered at 21) Treatment: Alcohol consumption Outcome: Mortality from various causes
Findings:
First stage: Discrete jump in drinking at age 21 (but not from 0 to 100%)
Reduced form: Large jump in mortality at age 21
Fuzzy RD estimate: Legal access to alcohol increases mortality substantially, especially from motor vehicle accidents
This study illustrates fuzzy RD applied to a sharp-looking policy (legal drinking age) that is fuzzy in practice (imperfect compliance with age restrictions).
14.4 Validity and Diagnostics
Manipulation Testing
The key threat to RD validity is manipulation of the running variable. If units can precisely control their value of X to be just above or just below the cutoff, the as-if random assignment breaks down.
McCrary (2008) density test: If manipulation is occurring, we expect a discontinuity in the density of X at the cutoff—more observations just above (or below) than smooth extrapolation would predict.
Implementation: Estimate the density of X on each side of the cutoff. Test for discontinuity using local polynomial density estimation. The rddensity package implements modern tests.
Interpretation:
Significant discontinuity in density → serious concern about manipulation
No discontinuity → manipulation not detected (but not ruled out)
Example: In Lee (2008), there is no bunching just above or below 50% vote share, supporting the claim that candidates cannot precisely control election outcomes.

Covariate Balance at the Cutoff
If RD is valid, pre-treatment covariates should be continuous at the cutoff. Testing for discontinuities in covariates serves as a placebo test.
Implementation: Run the RD specification with each covariate as the outcome. Test whether any covariate shows a discontinuity.
What to do if covariates are discontinuous?
If covariates jump at the cutoff, RD assumptions are violated
Discontinuous covariates suggest sorting or manipulation
Including such covariates as controls is inappropriate—they are "bad controls"
Sensitivity to Bandwidth
RD estimates should not be hypersensitive to bandwidth choice.
Robustness checks:
Report estimates for half and double the MSE-optimal bandwidth
Plot estimates as a function of bandwidth
Estimates should be reasonably stable across bandwidths
What if estimates vary wildly with bandwidth?
May indicate violation of continuity assumption
May indicate insufficient sample size near cutoff
Report the sensitivity honestly
Placebo Cutoffs
As an additional check, estimate RD at "fake" cutoffs where no treatment discontinuity exists.
Implementation: Run the RD analysis at cutoffs above or below the true threshold. For example, if the real cutoff is at X=50, test at X=45 and X=55.
Interpretation: Significant effects at placebo cutoffs suggest the outcome varies discontinuously with X in general—not specifically at the policy threshold. This undermines the RD interpretation.
Donut Hole RD
If manipulation is suspected specifically at the cutoff, one diagnostic is to exclude observations very close to the cutoff and re-estimate.
Implementation: Drop observations within a small window of the cutoff (the "donut hole") and estimate RD using remaining observations.
Interpretation:
If estimates are similar with and without the donut hole, manipulation concerns are mitigated
If estimates change substantially, manipulation may be concentrated at the cutoff
14.5 RD Design Variants
Geographic RD (Boundary Discontinuities)
Policies often vary across geographic boundaries—state lines, district borders, regulatory zones. This creates RD opportunities using geographic location as the running variable.
Setup:
Running variable: Distance to boundary (positive on one side, negative on the other)
Cutoff: The boundary itself (distance = 0)
Examples:
School quality effects using school district boundaries
Minimum wage effects using state borders (Dube, Lester, Reich 2010)
Pollution regulation effects using county lines
Challenges:
Two-dimensional geography requires careful definition of distance
Boundaries may not be randomly placed
Spillovers across boundaries
Multi-Cutoff RD
Sometimes the same policy uses different cutoffs in different contexts.
Example: Test score thresholds vary by school or district. Each school has its own cutoff for honors course placement.
Pooling: Cattaneo et al. (2016) develop methods for combining RD estimates across multiple cutoffs, improving precision while allowing for heterogeneity.
Regression Kink Design (RKD)
When treatment is continuous and its relationship with the running variable has a kink (change in slope) rather than a jump at the cutoff, regression kink design (RKD) can identify effects.
Example: Tax benefits that phase out at a rate that changes at certain income thresholds. There's no discontinuity in benefits, but the slope changes.
Estimand: RKD identifies the effect of a marginal increase in treatment at the kink point.
Requirements:
No jump in treatment at kink (otherwise it's RD)
Change in slope of treatment (first stage)
Outcome shows corresponding change in slope (reduced form)
RD with Discrete Running Variables
What if the running variable takes only discrete values (e.g., age in years, test scores in whole numbers, class size thresholds)?
The problem: Standard RD relies on continuity—comparing the limits from just above and just below the cutoff. But with discrete values, there is no "just below." The smallest bandwidth includes entire mass points, potentially far from the cutoff in true underlying ability or characteristics.
Protocol for Discrete Running Variables
Step 1: Assess severity
How many mass points are within a reasonable bandwidth?
What fraction of the sample is at each mass point?
Is there heaping (bunching at round numbers)?
Many (10+)
Standard RD may work; treat as quasi-continuous
Few (3-10)
Use Lee-Card correction or local randomization
Very few (1-2)
Local randomization only; identification is fragile
Step 2: Choose appropriate method
Option A: Lee-Card specification error correction
Cluster standard errors at the mass point level
This accounts for within-mass-point variation being uninformative
Widens confidence intervals appropriately
Option B: Local randomization framework
Define a window containing a small number of mass points
Assume treatment is "as-if random" within this window
Use permutation inference rather than asymptotics
Test covariate balance within the window
Option C: Fuzzy RD interpretation
Treat the running variable as a noisy measure of underlying position
Use IV with the discrete measure as instrument
Step 3: Report honestly
Show the distribution of the running variable
Report how many mass points drive the estimate
Conduct sensitivity to window width
Acknowledge reduced precision
Example: Maimonides' Rule (Angrist & Lavy 1999)
Class size is determined by enrollment thresholds at multiples of 40. Enrollment is discrete (whole numbers), creating sharp discontinuities at 41, 81, 121, etc. The authors use multiple cutoffs and account for the discrete nature by focusing on the first-stage relationship rather than pure continuity-based RD.
Lee and Card (2008) formalize the specification error problem and provide cluster-robust inference for discrete running variables.
Practical Guidance
When to Use RD
Sharp threshold determines treatment
Yes (sharp RD)
Classic case
Threshold affects treatment probability
Yes (fuzzy RD)
IV interpretation
Threshold is known and fixed
Yes
Cutoff must be known ex ante
Running variable can be precisely manipulated
No
Manipulation violates assumptions
Threshold was chosen based on outcomes
No
Endogenous cutoff
Need effect for units far from cutoff
No
RD is local; extrapolation unjustified
Common Pitfalls
Pitfall 1: High-order global polynomials Fitting quintic or higher-order polynomials to all observations can produce wildly incorrect estimates at the cutoff, as the polynomial chases noise in the tails.
How to avoid: Use local linear or local quadratic estimation with data-driven bandwidth selection.
Pitfall 2: Ignoring manipulation Assuming manipulation is not present because you don't observe it directly. Manipulation can be hard to detect.
How to avoid: Always run density tests. Examine institutional details—could units plausibly manipulate X? Consider donut hole analysis.
Pitfall 3: Interpreting RD as a global effect The RD estimate is specific to units at the cutoff. It may not apply to units far above or below.
How to avoid: Be clear about the local nature of the estimand. Discuss whether effects might differ away from the cutoff.
Pitfall 4: Endogenous bandwidth selection Choosing the bandwidth that produces the desired result is specification searching.
How to avoid: Use data-driven bandwidth selection (MSE-optimal). Report results for a range of bandwidths. Pre-specify analysis choices when possible.
Pitfall 5: Overinterpreting non-significant density tests Failing to reject the null of no manipulation doesn't prove there is no manipulation—it may reflect low power.
How to avoid: Report the density test but recognize its limitations. Examine institutional features that make manipulation more or less plausible.
Implementation Checklist
Qualitative Bridge
How Qualitative Methods Complement RD
RD provides highly credible local causal effects but leaves questions about mechanisms and external validity unanswered. Qualitative research addresses these gaps.
When to Combine
Understanding the discontinuity: Qualitative investigation can reveal what actually happens at the threshold. For a scholarship cutoff, do students just above and just below the threshold actually have similar characteristics? Do they experience the threshold as consequential?
Mechanism exploration: RD tells us winning an election increases future electoral success. Qualitative research—interviews with legislators, analysis of campaign strategies—can reveal why: Is it name recognition? Fundraising advantages? Ability to deliver constituency services?
External validity: The RD effect applies to marginal units. Qualitative understanding of who these units are and how they differ from non-marginal units helps assess generalizability.
Example: Electoral RD
Lee's (2008) finding of large incumbency advantages has been elaborated through qualitative work:
Campaign ethnographies reveal how incumbents use their office for electoral advantage
Interviews with campaign staff identify specific mechanisms (franking privilege, press access, pork-barrel spending)
Case studies of close elections show the experiences of barely-winners and barely-losers differ in observable ways that correspond to the quantitative findings
This qualitative evidence strengthens confidence that the RD captures a real incumbency advantage (not a statistical artifact) and illuminates how the effect operates.
Integration Note
Connections to Other Methods
Instrumental Variables
Fuzzy RD is an IV design; threshold instruments for treatment
Ch. 12
Difference-in-Differences
RD in time: sharp policy change at specific date
Ch. 13
Selection on Observables
Both use continuity; RD requires continuity at cutoff, not globally
Ch. 11
Experiments
RD = quasi-experiment with "natural" randomization at cutoff
Ch. 10
Triangulation Strategies
RD estimates gain credibility when combined with:
Different outcomes: Do multiple related outcomes show discontinuities consistent with the mechanism?
Different running variables: If the same treatment has multiple thresholds, do estimates agree?
Alternative specifications: Do local linear, local quadratic, and different bandwidths agree?
Qualitative evidence: Do interviews and observation confirm the mechanism?
DiD as complement: If the policy changed over time, do DiD and RD estimates align?
Running Example: Electoral RD
We develop Lee's (2008) electoral RD in more detail as a comprehensive example.
The Setting
U.S. House elections determine outcomes by plurality vote. In races between two major-party candidates, the Democrat wins if and only if their vote share exceeds 50%.
Running variable: Democratic vote share margin = (Dem votes) / (Dem + Rep votes) - 0.5
X>0: Democrat won
X<0: Republican won
Treatment: Democratic victory (incumbency)
Outcome: Democratic vote share in the next election
Why RD Works Here
Continuity: In close elections, which candidate wins is essentially a matter of chance—random sampling variability in who shows up to vote, weather effects, late-breaking news. Candidates cannot precisely manipulate vote shares to be just above 50%.
Manipulation check: Lee shows no discontinuity in the density of vote margins at 50%—there is no bunching just above or below the threshold.
Balance check: Pre-determined characteristics (district demographics, lagged outcomes) show no discontinuity at 50%.
Results
Lee finds:
Just below 50%
~45%
Just above 50%
~58%
Discontinuity
~13 percentage points
Barely winning increases the probability of winning the next election by about 40 percentage points (from ~30% to ~70%).
Interpretation and Mechanisms
The RD identifies a large incumbency advantage, but what drives it?
Possible mechanisms:
Deterrence: Incumbency deters high-quality challengers
Resources: Incumbents have advantages in fundraising, media access
Experience: Incumbents become better campaigners
Constituency service: Incumbents can provide benefits to voters
Selection: Winners are revealed as higher-quality candidates
Follow-up work (Caughey and Sekhon 2011, Eggers et al. 2015) has examined these mechanisms and investigated whether the Lee design is truly valid.
Challenges to the Lee Design
Caughey and Sekhon (2011) find covariate imbalance in very close House elections, suggesting possible manipulation or sorting.
Eggers et al. (2015) show the imbalance finding does not replicate across many electoral RD settings; House elections may be unusual.
Lesson: Even paradigmatic RD designs warrant careful validation. The density test and covariate balance checks are essential, not optional.
Summary
Key takeaways:
RD exploits threshold-based assignment: When treatment is determined by crossing a cutoff in a continuous running variable, we can compare units just above and below to identify causal effects.
The key assumption is continuity: Potential outcomes must be continuous in the running variable at the cutoff. Any jump in outcomes at the threshold is attributed to treatment.
Manipulation is the key threat: If units can precisely control their running variable to be just above or below the cutoff, continuity fails. Density tests and covariate balance checks are essential diagnostics.
Estimation uses local polynomial methods: Fit separate regressions on each side of the cutoff using observations within a bandwidth. MSE-optimal bandwidths balance bias and variance.
RD estimates are local: The effect applies specifically to units at the cutoff. Extrapolation to units far from the threshold is not justified by the design.
Returning to the opening question: When treatment is assigned by crossing a threshold, we can recover causal effects by comparing units just above and below—but only if units cannot manipulate their position relative to the cutoff. The credibility of RD depends on this local randomization, which must be investigated through density tests, covariate balance checks, and institutional analysis of whether manipulation is possible.
Further Reading
Essential
Cattaneo, Idrobo, and Titiunik (2020), A Practical Introduction to Regression Discontinuity Designs - Comprehensive modern textbook
Lee and Lemieux (2010), "Regression Discontinuity Designs in Economics" - Foundational survey
For Deeper Understanding
Imbens and Lemieux (2008), "Regression Discontinuity Designs: A Guide to Practice" - Classic methodological guide
Calonico, Cattaneo, and Titiunik (2014), "Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs" - Bias-correction and robust inference
Gelman and Imbens (2019), "Why High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs" - Warns against global polynomial fitting
Advanced/Specialized
McCrary (2008), "Manipulation of the Running Variable in the Regression Discontinuity Design" - The density test
Cattaneo, Jansson, and Ma (2020), "Simple Local Polynomial Density Estimators" - Modern density testing
Lee and Card (2008), "Regression Discontinuity Inference with Specification Error" - Discrete running variables
Applications
Lee (2008), "Randomized Experiments from Non-random Selection in U.S. House Elections" - The paradigmatic electoral RD
Carpenter and Dobkin (2009), "The Effect of Alcohol Consumption on Mortality" - Drinking age fuzzy RD
Dell (2010), "The Persistent Effects of Peru's Mining Mita" - Geographic RD with historical treatment
Exercises
Conceptual
Explain why RD estimates are "local" and what this means for external validity. Under what circumstances might the local effect generalize to units further from the cutoff?
Why is manipulation of the running variable a threat to RD validity? Give an example of a setting where manipulation is plausible and one where it is implausible.
In fuzzy RD, what population is the treatment effect identified for? How does this relate to the IV concept of "compliers"?
Applied
Using electoral data, replicate the Lee (2008) analysis of incumbency advantage. Produce RD plots, estimate treatment effects with rdrobust, and conduct the McCrary density test.
Find a policy that uses a test score or age threshold. Implement an RD analysis including all validity checks discussed in this chapter.
Discussion
Some researchers argue that RD provides the most credible quasi-experimental evidence short of actual experiments. Others argue that its local nature limits its usefulness for policy. Which view do you find more compelling, and why?
Appendix 14A: Technical Details of Local Polynomial Estimation
Kernel Weighting
Local polynomial regression can incorporate kernel weights that give more weight to observations closer to the cutoff:
τ^RD=argminα,τ,β∑i:∣Xi−c∣≤hK(hXi−c)(Yi−α−τDi−β(Xi−c)−γDi(Xi−c))2
Common kernels:
Triangular: K(u)=(1−∣u∣)⋅1[∣u∣≤1]
Uniform (no weighting): K(u)=0.5⋅1[∣u∣≤1]
Epanechnikov: K(u)=0.75(1−u2)⋅1[∣u∣≤1]
The triangular kernel is MSE-optimal for boundary estimation (the situation in RD).
Asymptotic Properties
Under regularity conditions, the local linear RD estimator has asymptotic distribution:
nh(τ^RD−τRD−h2B)dN(0,V)
where B is the bias term and V is the asymptotic variance. The bias term depends on the second derivatives of the conditional expectation functions.
MSE-Optimal Bandwidth
The MSE-optimal bandwidth balances squared bias and variance:
hMSE=C⋅n−1/5
where C depends on:
Curvature of E[Y∣X] at the cutoff (more curvature → smaller bandwidth)
Density of X at the cutoff (higher density → smaller bandwidth)
Residual variance
Last updated