Chapter 14: Regression Discontinuity

Opening Question

When treatment is assigned by crossing a threshold, can we recover causal effects by comparing units just above and just below the cutoff?


Chapter Overview

Regression discontinuity (RD) designs exploit situations where treatment is determined, wholly or partly, by whether a continuous variable crosses a threshold. Students receive scholarships if their test scores exceed a cutoff. Politicians win elections if their vote share exceeds 50%. Policies take effect when population crosses a boundary. At these thresholds, units just above and just below are nearly identical in expectation—as if treatment were randomly assigned among units near the cutoff.

This "local randomization" intuition makes RD one of the most credible quasi-experimental designs. When the assignment variable cannot be precisely manipulated, crossing the threshold is effectively random for units near the cutoff. RD combines the credibility of randomized experiments (for units near the cutoff) with the observational data advantages of using naturally occurring variation.

This chapter develops the RD framework, distinguishing sharp designs (treatment perfectly determined by the cutoff) from fuzzy designs (cutoff induces variation in treatment probability). We cover estimation, bandwidth selection, validity diagnostics, and the interpretation challenges that arise in practice.

What you will learn:

  • The local randomization logic underlying RD designs

  • How to estimate RD effects using local polynomial regression

  • Bandwidth selection: the bias-variance tradeoff

  • Distinguishing sharp vs. fuzzy RD and estimating each

  • Validity diagnostics: testing for manipulation and covariate balance

  • How to present RD evidence convincingly

Prerequisites: Chapter 9 (Causal Framework), Chapter 12 (IV, for fuzzy RD), Chapter 3 (Statistical Foundations)


14.1 The Sharp RD Design

Setup and Notation

Let XiX_i be a continuous running variable (also called the assignment or forcing variable) and cc be the cutoff. In a sharp RD design, treatment is a deterministic function of the running variable:

Di=1[Xic]D_i = \mathbf{1}[X_i \geq c]

Units with XicX_i \geq c are treated; units with Xi<cX_i < c are controls. There is no ambiguity, no exceptions, no discretion.

Examples of sharp RD:

  • Test score thresholds for program eligibility

  • Age cutoffs for policy exposure (drinking age, voting age, retirement)

  • Date cutoffs (fiscal year boundaries, policy implementation dates)

  • Geographic boundaries (different regulations on either side of a border)

The Local Randomization Intuition

Why does RD work? Consider units with XiX_i very close to the cutoff—say, within a narrow bandwidth hh of cc. For these units:

  1. Balance: Any pre-treatment characteristic that varies smoothly with XX will be nearly identical for units just above and just below cc

  2. As-if random: The small differences in XX that determine treatment status are effectively random

This intuition is formalized through continuity assumptions.

Assumption 14.1 (Continuity of Potential Outcomes): The conditional expectation functions of potential outcomes are continuous in the running variable at the cutoff: limxcE[Yi(1)Xi=x]=E[Yi(1)Xi=c]\lim_{x \downarrow c} E[Y_i(1) | X_i = x] = E[Y_i(1) | X_i = c] limxcE[Yi(0)Xi=x]=E[Yi(0)Xi=c]\lim_{x \uparrow c} E[Y_i(0) | X_i = x] = E[Y_i(0) | X_i = c]

Intuition: There is no discontinuity in how outcomes would evolve with XX absent the treatment threshold. Any jump in outcomes at cc must be due to treatment.

Box: Two Frameworks for RD—Continuity vs. Local Randomization

The RD literature has developed two distinct conceptual frameworks. Understanding the difference clarifies what RD assumes and when it works.

Continuity-Based Framework (Hahn, Todd, Van der Klaauw 2001)

The classical approach assumes potential outcomes are continuous functions of the running variable at the cutoff. Identification comes from comparing limits: τRD=limxcE[YX=x]limxcE[YX=x]\tau_{RD} = \lim_{x \downarrow c} E[Y|X=x] - \lim_{x \uparrow c} E[Y|X=x]

  • Key assumption: Smoothness of potential outcomes at cc

  • Estimation: Local polynomial regression with carefully chosen bandwidth

  • Inference: Asymptotic approximations based on boundary kernel estimation

  • Covariates: Not required for identification; can improve precision

Local Randomization Framework (Cattaneo, Frandsen, Titiunik 2015)

This alternative assumes that in a small window around the cutoff, assignment is as-if randomly assigned—like a local experiment: Di(Yi(0),Yi(1))for Xi[cw,c+w]D_i \perp (Y_i(0), Y_i(1)) \quad \text{for } X_i \in [c-w, c+w]

  • Key assumption: Local random assignment in window ww

  • Estimation: Difference in means (or randomization inference)

  • Inference: Fisher-style permutation tests

  • Covariates: Can test local balance (like an RCT)

When Does Each Apply?

Situation
Better Framework

Running variable precisely determined

Continuity (local randomization may fail)

Running variable has measurement error or noise

Local randomization may apply

Large sample, smooth relationship

Continuity (polynomial approximation works)

Small sample, discrete running variable

Local randomization (window-based inference)

Manipulation concerns

Both require no manipulation

Practical implication: Most RD analyses use the continuity framework and local polynomial methods. But the local randomization interpretation provides clearer intuition and suggests different diagnostics (testing covariate balance within the window, as in an RCT). The two frameworks often give similar answers when the running variable has sufficient noise near the cutoff—which is precisely when RD is most credible.

The RD Estimand

Under the continuity assumption, the RD identifies a local average treatment effect at the cutoff:

τRD=E[Yi(1)Yi(0)Xi=c]\tau_{RD} = E[Y_i(1) - Y_i(0) | X_i = c]

This is the treatment effect specifically for units at the threshold—those on the margin of treatment.

Comparison to other designs:

  • RCT: ATE for experimental sample

  • IV: LATE for compliers

  • RD: Effect at the cutoff

The RD estimand is highly local. This is both a strength (internal validity is very credible) and a limitation (effects may not generalize away from the cutoff).

Graphical Analysis

Before any formal estimation, RD demands graphical analysis. A well-constructed RD plot shows:

  1. Binned means: Divide XX into bins and plot average YY in each bin

  2. The cutoff: Clear vertical line at cc

  3. Fitted curves: Local polynomial fits on each side of the cutoff

  4. The discontinuity: Visual jump (or lack thereof) at the cutoff

A clear visual discontinuity is the first line of evidence. If the jump is not visible in the raw data, sophisticated estimation won't rescue the analysis.

Figure 14.1: Sharp Regression Discontinuity Design. Units just below the cutoff (control) and just above (treated) are compared. The discontinuity at the cutoff identifies the local average treatment effect. Fitted curves on each side of the cutoff estimate the conditional expectation functions.

Example: Electoral RD (Lee 2008)

David Lee's (2008) study of U.S. House elections is the paradigmatic RD application.

Setting: In U.S. House elections, the Democrat wins if their vote share exceeds 50%. This creates a sharp discontinuity.

Question: What is the effect of winning a House election on future electoral success? (Incumbency advantage)

Running variable: Democratic vote share margin (vote share minus 50%)

  • Xi>0X_i > 0: Democrat won

  • Xi<0X_i < 0: Democrat lost

Key insight: In very close elections (say, decided by less than 1 percentage point), the winner is essentially random. Which candidate had slightly more votes on election night is not systematically related to candidate quality, district preferences, or other confounders.

Findings: Lee finds a large discontinuity—Democratic candidates who barely win are about 40 percentage points more likely to win the next election compared to Democrats who barely lose. This incumbency advantage reflects both the benefits of holding office (name recognition, constituency service, campaign finance) and the selection of strong candidates into safe seats.


14.2 Estimation

Local Polynomial Regression

The standard approach estimates the conditional expectation function separately on each side of the cutoff using local polynomial regression.

Linear RD estimator: Fit linear functions on each side:

Yi=αl+βl(Xic)+εifor Xi<cY_i = \alpha_l + \beta_l(X_i - c) + \varepsilon_i \quad \text{for } X_i < c Yi=αr+βr(Xic)+εifor XicY_i = \alpha_r + \beta_r(X_i - c) + \varepsilon_i \quad \text{for } X_i \geq c

The RD estimate is τ^RD=α^rα^l\hat{\tau}_{RD} = \hat{\alpha}_r - \hat{\alpha}_l.

Why local? We only use observations within a bandwidth hh of the cutoff: Xi[ch,c+h]X_i \in [c-h, c+h].

Why polynomial? To approximate potentially curved conditional expectation functions near the cutoff.

Regression formulation: Equivalently, we can estimate:

Yi=α+τDi+β1(Xic)+β2Di(Xic)+εiY_i = \alpha + \tau D_i + \beta_1(X_i - c) + \beta_2 D_i \cdot (X_i - c) + \varepsilon_i

for observations with Xich|X_i - c| \leq h, where τ^\hat{\tau} is the RD estimate.

Polynomial Order Choice

Linear (p = 1): Usually preferred. Linear approximation works well near the cutoff; higher-order terms add variance without improving bias much.

Quadratic or higher (p ≥ 2): Sometimes used for robustness checks. Higher-order polynomials can fit curved relationships better but are more sensitive to observations far from the cutoff.

Pitfall: Global Polynomial Fitting Fitting high-order polynomials (p = 3, 4, 5...) to all the data is dangerous. Such fits can chase noise far from the cutoff and produce wildly inaccurate estimates at cc. Gelman and Imbens (2019) recommend against polynomials higher than quadratic.

How to avoid: Stick to local linear or local quadratic estimation with appropriate bandwidth.

Bandwidth Selection

The bandwidth hh governs the bias-variance tradeoff:

  • Small hh: Lower bias (using only observations very similar to cutoff), but higher variance (fewer observations)

  • Large hh: Lower variance (more observations), but higher bias (observations far from cutoff less relevant)

MSE-optimal bandwidth: Imbens and Kalyanaraman (2012) and Calonico, Cattaneo, and Titiunik (2014) derive bandwidth selectors that minimize mean squared error (MSE):

hMSE=Cn1/5h^{MSE} = C \cdot n^{-1/5}

where CC depends on curvature of the conditional expectation and density of XX.

Robust inference: Using the MSE-optimal bandwidth for both estimation and inference understates uncertainty (because the bandwidth was chosen using the data). Calonico, Cattaneo, and Titiunik (2014) provide bias-corrected estimates and robust confidence intervals.

Figure 14.2: The Bandwidth-Bias Tradeoff. Narrow bandwidths (left) reduce bias but include few observations, increasing variance. Wide bandwidths (right) include more observations but may introduce bias if the relationship is non-linear. MSE-optimal bandwidths balance this tradeoff.

Implementation: The rdrobust package (R and Stata) automates:

  • MSE-optimal bandwidth selection

  • Bias-corrected estimation

  • Robust confidence intervals

Covariates in RD

Including pre-treatment covariates can improve precision but raises issues:

When covariates help:

  • If covariates predict outcomes strongly, including them reduces residual variance

  • In fuzzy RD, covariates can strengthen the first stage

How to include covariates:

  1. Add covariates linearly to the RD specification

  2. Or use covariates to residualize YY first, then run RD on residuals

Important: Only include pre-determined covariates—variables that could not be affected by treatment.

Caution: If RD is valid, covariates should be balanced at the cutoff. If including covariates changes estimates substantially, this suggests the covariates are discontinuous—which indicates a problem with the design, not something to "control for."


14.3 Fuzzy Regression Discontinuity

Setup

In many applications, the cutoff does not perfectly determine treatment. Instead, crossing the threshold increases the probability of treatment:

limxcP(Di=1Xi=x)limxcP(Di=1Xi=x)>0\lim_{x \downarrow c} P(D_i = 1 | X_i = x) - \lim_{x \uparrow c} P(D_i = 1 | X_i = x) > 0

but this probability is not 1 vs. 0.

Examples of fuzzy RD:

  • Scholarship eligibility thresholds where not all eligible students accept

  • Age thresholds for drinking where enforcement is imperfect

  • Income thresholds for program eligibility with measurement error in income

The IV Interpretation

Fuzzy RD has an instrumental variables interpretation. The threshold acts as an instrument:

  1. Relevance: Crossing the threshold affects treatment probability (first stage)

  2. Exclusion: The threshold affects outcomes only through treatment (no direct effect of being just above vs. just below the cutoff except via treatment)

The fuzzy RD estimand is a local average treatment effect (LATE) for compliers at the cutoff—units who are treated when just above but not when just below.

Box: The "Doubly Local" Nature of Fuzzy RD

Fuzzy RD estimates are local in two ways:

1. Local to the cutoff (like all RD) We identify effects only for units near the threshold. With test score cutoffs, we learn about students scoring around the cutoff—not high achievers or struggling students far from the threshold.

2. Local to compliers (like all IV) Among units at the cutoff, we identify effects only for compliers—those whose treatment status is changed by crossing the threshold. Always-takers (treated regardless of side) and never-takers (untreated regardless) contribute nothing to identification.

Who are the compliers?

  • Above cutoff: treated

  • Below cutoff: would not have been treated

In the drinking age example: compliers are individuals who drink legally at 21 but would not have drunk (or drunk less) at 20. Those who drink heavily regardless of legal status (always-takers) or abstain regardless (never-takers) don't inform the estimate.

Interpretation caution: The fuzzy RD effect applies to a very specific population—compliers at the margin. This may be a small and unusual group. A college admissions fuzzy RD identifies effects for marginal admits who just qualified—not typical students, and not those whose admission was determined by other factors (legacy, athletic recruitment).

Quantifying the complier population: The first-stage jump tells you the complier share. If treatment probability jumps from 40% to 70% at the cutoff, compliers are 30% of the population at the threshold. The smaller this share, the more specialized your estimand.

Estimation

Wald estimator: The fuzzy RD estimate is the ratio of the outcome discontinuity to the treatment discontinuity:

τ^FRD=limxcE[YX=x]limxcE[YX=x]limxcE[DX=x]limxcE[DX=x]\hat{\tau}_{FRD} = \frac{\lim_{x \downarrow c} E[Y | X = x] - \lim_{x \uparrow c} E[Y | X = x]}{\lim_{x \downarrow c} E[D | X = x] - \lim_{x \uparrow c} E[D | X = x]}

This is exactly the 2SLS estimator using the threshold as an instrument for treatment.

Implementation: Two approaches:

  1. Separate RD estimates: Estimate sharp RD for YY (reduced form) and sharp RD for DD (first stage), then divide

  2. 2SLS: Run 2SLS with 1[Xc]\mathbf{1}[X \geq c] as instrument for DD, restricting to observations near the cutoff

Example: Drinking Age and Mortality (Carpenter and Dobkin)

Carpenter and Dobkin (2009) study whether legal access to alcohol increases mortality.

Setting: In the U.S., individuals can legally purchase alcohol at age 21. This creates a fuzzy discontinuity—some under-21s drink illegally, and reaching 21 doesn't make everyone drink.

Running variable: Age (centered at 21) Treatment: Alcohol consumption Outcome: Mortality from various causes

Findings:

  • First stage: Discrete jump in drinking at age 21 (but not from 0 to 100%)

  • Reduced form: Large jump in mortality at age 21

  • Fuzzy RD estimate: Legal access to alcohol increases mortality substantially, especially from motor vehicle accidents

This study illustrates fuzzy RD applied to a sharp-looking policy (legal drinking age) that is fuzzy in practice (imperfect compliance with age restrictions).


14.4 Validity and Diagnostics

Manipulation Testing

The key threat to RD validity is manipulation of the running variable. If units can precisely control their value of XX to be just above or just below the cutoff, the as-if random assignment breaks down.

McCrary (2008) density test: If manipulation is occurring, we expect a discontinuity in the density of XX at the cutoff—more observations just above (or below) than smooth extrapolation would predict.

Implementation: Estimate the density of XX on each side of the cutoff. Test for discontinuity using local polynomial density estimation. The rddensity package implements modern tests.

Interpretation:

  • Significant discontinuity in density → serious concern about manipulation

  • No discontinuity → manipulation not detected (but not ruled out)

Example: In Lee (2008), there is no bunching just above or below 50% vote share, supporting the claim that candidates cannot precisely control election outcomes.

Figure 14.3: Manipulation Testing with Density Plots. The left panel shows no manipulation—the density of the running variable is smooth through the cutoff. The right panel shows manipulation—there is a suspicious jump in density just above the cutoff, suggesting units may be able to precisely control their position. The McCrary density test formalizes this visual check.

Covariate Balance at the Cutoff

If RD is valid, pre-treatment covariates should be continuous at the cutoff. Testing for discontinuities in covariates serves as a placebo test.

Implementation: Run the RD specification with each covariate as the outcome. Test whether any covariate shows a discontinuity.

What to do if covariates are discontinuous?

  • If covariates jump at the cutoff, RD assumptions are violated

  • Discontinuous covariates suggest sorting or manipulation

  • Including such covariates as controls is inappropriate—they are "bad controls"

Sensitivity to Bandwidth

RD estimates should not be hypersensitive to bandwidth choice.

Robustness checks:

  • Report estimates for half and double the MSE-optimal bandwidth

  • Plot estimates as a function of bandwidth

  • Estimates should be reasonably stable across bandwidths

What if estimates vary wildly with bandwidth?

  • May indicate violation of continuity assumption

  • May indicate insufficient sample size near cutoff

  • Report the sensitivity honestly

Placebo Cutoffs

As an additional check, estimate RD at "fake" cutoffs where no treatment discontinuity exists.

Implementation: Run the RD analysis at cutoffs above or below the true threshold. For example, if the real cutoff is at X=50X = 50, test at X=45X = 45 and X=55X = 55.

Interpretation: Significant effects at placebo cutoffs suggest the outcome varies discontinuously with XX in general—not specifically at the policy threshold. This undermines the RD interpretation.

Donut Hole RD

If manipulation is suspected specifically at the cutoff, one diagnostic is to exclude observations very close to the cutoff and re-estimate.

Implementation: Drop observations within a small window of the cutoff (the "donut hole") and estimate RD using remaining observations.

Interpretation:

  • If estimates are similar with and without the donut hole, manipulation concerns are mitigated

  • If estimates change substantially, manipulation may be concentrated at the cutoff


14.5 RD Design Variants

Geographic RD (Boundary Discontinuities)

Policies often vary across geographic boundaries—state lines, district borders, regulatory zones. This creates RD opportunities using geographic location as the running variable.

Setup:

  • Running variable: Distance to boundary (positive on one side, negative on the other)

  • Cutoff: The boundary itself (distance = 0)

Examples:

  • School quality effects using school district boundaries

  • Minimum wage effects using state borders (Dube, Lester, Reich 2010)

  • Pollution regulation effects using county lines

Challenges:

  • Two-dimensional geography requires careful definition of distance

  • Boundaries may not be randomly placed

  • Spillovers across boundaries

Multi-Cutoff RD

Sometimes the same policy uses different cutoffs in different contexts.

Example: Test score thresholds vary by school or district. Each school has its own cutoff for honors course placement.

Pooling: Cattaneo et al. (2016) develop methods for combining RD estimates across multiple cutoffs, improving precision while allowing for heterogeneity.

Regression Kink Design (RKD)

When treatment is continuous and its relationship with the running variable has a kink (change in slope) rather than a jump at the cutoff, regression kink design (RKD) can identify effects.

Example: Tax benefits that phase out at a rate that changes at certain income thresholds. There's no discontinuity in benefits, but the slope changes.

Estimand: RKD identifies the effect of a marginal increase in treatment at the kink point.

Requirements:

  • No jump in treatment at kink (otherwise it's RD)

  • Change in slope of treatment (first stage)

  • Outcome shows corresponding change in slope (reduced form)

RD with Discrete Running Variables

What if the running variable takes only discrete values (e.g., age in years, test scores in whole numbers, class size thresholds)?

The problem: Standard RD relies on continuity—comparing the limits from just above and just below the cutoff. But with discrete values, there is no "just below." The smallest bandwidth includes entire mass points, potentially far from the cutoff in true underlying ability or characteristics.

Protocol for Discrete Running Variables

Step 1: Assess severity

  • How many mass points are within a reasonable bandwidth?

  • What fraction of the sample is at each mass point?

  • Is there heaping (bunching at round numbers)?

Mass Points Near Cutoff
Approach

Many (10+)

Standard RD may work; treat as quasi-continuous

Few (3-10)

Use Lee-Card correction or local randomization

Very few (1-2)

Local randomization only; identification is fragile

Step 2: Choose appropriate method

Option A: Lee-Card specification error correction

  • Cluster standard errors at the mass point level

  • This accounts for within-mass-point variation being uninformative

  • Widens confidence intervals appropriately

Option B: Local randomization framework

  • Define a window containing a small number of mass points

  • Assume treatment is "as-if random" within this window

  • Use permutation inference rather than asymptotics

  • Test covariate balance within the window

Option C: Fuzzy RD interpretation

  • Treat the running variable as a noisy measure of underlying position

  • Use IV with the discrete measure as instrument

Step 3: Report honestly

  • Show the distribution of the running variable

  • Report how many mass points drive the estimate

  • Conduct sensitivity to window width

  • Acknowledge reduced precision

Example: Maimonides' Rule (Angrist & Lavy 1999)

Class size is determined by enrollment thresholds at multiples of 40. Enrollment is discrete (whole numbers), creating sharp discontinuities at 41, 81, 121, etc. The authors use multiple cutoffs and account for the discrete nature by focusing on the first-stage relationship rather than pure continuity-based RD.

Lee and Card (2008) formalize the specification error problem and provide cluster-robust inference for discrete running variables.


Practical Guidance

When to Use RD

Situation
Appropriate?
Notes

Sharp threshold determines treatment

Yes (sharp RD)

Classic case

Threshold affects treatment probability

Yes (fuzzy RD)

IV interpretation

Threshold is known and fixed

Yes

Cutoff must be known ex ante

Running variable can be precisely manipulated

No

Manipulation violates assumptions

Threshold was chosen based on outcomes

No

Endogenous cutoff

Need effect for units far from cutoff

No

RD is local; extrapolation unjustified

Common Pitfalls

Pitfall 1: High-order global polynomials Fitting quintic or higher-order polynomials to all observations can produce wildly incorrect estimates at the cutoff, as the polynomial chases noise in the tails.

How to avoid: Use local linear or local quadratic estimation with data-driven bandwidth selection.

Pitfall 2: Ignoring manipulation Assuming manipulation is not present because you don't observe it directly. Manipulation can be hard to detect.

How to avoid: Always run density tests. Examine institutional details—could units plausibly manipulate XX? Consider donut hole analysis.

Pitfall 3: Interpreting RD as a global effect The RD estimate is specific to units at the cutoff. It may not apply to units far above or below.

How to avoid: Be clear about the local nature of the estimand. Discuss whether effects might differ away from the cutoff.

Pitfall 4: Endogenous bandwidth selection Choosing the bandwidth that produces the desired result is specification searching.

How to avoid: Use data-driven bandwidth selection (MSE-optimal). Report results for a range of bandwidths. Pre-specify analysis choices when possible.

Pitfall 5: Overinterpreting non-significant density tests Failing to reject the null of no manipulation doesn't prove there is no manipulation—it may reflect low power.

How to avoid: Report the density test but recognize its limitations. Examine institutional features that make manipulation more or less plausible.

Implementation Checklist


Qualitative Bridge

How Qualitative Methods Complement RD

RD provides highly credible local causal effects but leaves questions about mechanisms and external validity unanswered. Qualitative research addresses these gaps.

When to Combine

Understanding the discontinuity: Qualitative investigation can reveal what actually happens at the threshold. For a scholarship cutoff, do students just above and just below the threshold actually have similar characteristics? Do they experience the threshold as consequential?

Mechanism exploration: RD tells us winning an election increases future electoral success. Qualitative research—interviews with legislators, analysis of campaign strategies—can reveal why: Is it name recognition? Fundraising advantages? Ability to deliver constituency services?

External validity: The RD effect applies to marginal units. Qualitative understanding of who these units are and how they differ from non-marginal units helps assess generalizability.

Example: Electoral RD

Lee's (2008) finding of large incumbency advantages has been elaborated through qualitative work:

  • Campaign ethnographies reveal how incumbents use their office for electoral advantage

  • Interviews with campaign staff identify specific mechanisms (franking privilege, press access, pork-barrel spending)

  • Case studies of close elections show the experiences of barely-winners and barely-losers differ in observable ways that correspond to the quantitative findings

This qualitative evidence strengthens confidence that the RD captures a real incumbency advantage (not a statistical artifact) and illuminates how the effect operates.


Integration Note

Connections to Other Methods

Method
Relationship
See Chapter

Instrumental Variables

Fuzzy RD is an IV design; threshold instruments for treatment

Ch. 12

Difference-in-Differences

RD in time: sharp policy change at specific date

Ch. 13

Selection on Observables

Both use continuity; RD requires continuity at cutoff, not globally

Ch. 11

Experiments

RD = quasi-experiment with "natural" randomization at cutoff

Ch. 10

Triangulation Strategies

RD estimates gain credibility when combined with:

  1. Different outcomes: Do multiple related outcomes show discontinuities consistent with the mechanism?

  2. Different running variables: If the same treatment has multiple thresholds, do estimates agree?

  3. Alternative specifications: Do local linear, local quadratic, and different bandwidths agree?

  4. Qualitative evidence: Do interviews and observation confirm the mechanism?

  5. DiD as complement: If the policy changed over time, do DiD and RD estimates align?


Running Example: Electoral RD

We develop Lee's (2008) electoral RD in more detail as a comprehensive example.

The Setting

U.S. House elections determine outcomes by plurality vote. In races between two major-party candidates, the Democrat wins if and only if their vote share exceeds 50%.

Running variable: Democratic vote share margin = (Dem votes) / (Dem + Rep votes) - 0.5

  • X>0X > 0: Democrat won

  • X<0X < 0: Republican won

Treatment: Democratic victory (incumbency)

Outcome: Democratic vote share in the next election

Why RD Works Here

Continuity: In close elections, which candidate wins is essentially a matter of chance—random sampling variability in who shows up to vote, weather effects, late-breaking news. Candidates cannot precisely manipulate vote shares to be just above 50%.

Manipulation check: Lee shows no discontinuity in the density of vote margins at 50%—there is no bunching just above or below the threshold.

Balance check: Pre-determined characteristics (district demographics, lagged outcomes) show no discontinuity at 50%.

Results

Lee finds:

Democratic Vote Margin
Next-Election Democratic Vote Share

Just below 50%

~45%

Just above 50%

~58%

Discontinuity

~13 percentage points

Barely winning increases the probability of winning the next election by about 40 percentage points (from ~30% to ~70%).

Interpretation and Mechanisms

The RD identifies a large incumbency advantage, but what drives it?

Possible mechanisms:

  1. Deterrence: Incumbency deters high-quality challengers

  2. Resources: Incumbents have advantages in fundraising, media access

  3. Experience: Incumbents become better campaigners

  4. Constituency service: Incumbents can provide benefits to voters

  5. Selection: Winners are revealed as higher-quality candidates

Follow-up work (Caughey and Sekhon 2011, Eggers et al. 2015) has examined these mechanisms and investigated whether the Lee design is truly valid.

Challenges to the Lee Design

Caughey and Sekhon (2011) find covariate imbalance in very close House elections, suggesting possible manipulation or sorting.

Eggers et al. (2015) show the imbalance finding does not replicate across many electoral RD settings; House elections may be unusual.

Lesson: Even paradigmatic RD designs warrant careful validation. The density test and covariate balance checks are essential, not optional.


Summary

Key takeaways:

  1. RD exploits threshold-based assignment: When treatment is determined by crossing a cutoff in a continuous running variable, we can compare units just above and below to identify causal effects.

  2. The key assumption is continuity: Potential outcomes must be continuous in the running variable at the cutoff. Any jump in outcomes at the threshold is attributed to treatment.

  3. Manipulation is the key threat: If units can precisely control their running variable to be just above or below the cutoff, continuity fails. Density tests and covariate balance checks are essential diagnostics.

  4. Estimation uses local polynomial methods: Fit separate regressions on each side of the cutoff using observations within a bandwidth. MSE-optimal bandwidths balance bias and variance.

  5. RD estimates are local: The effect applies specifically to units at the cutoff. Extrapolation to units far from the threshold is not justified by the design.

Returning to the opening question: When treatment is assigned by crossing a threshold, we can recover causal effects by comparing units just above and below—but only if units cannot manipulate their position relative to the cutoff. The credibility of RD depends on this local randomization, which must be investigated through density tests, covariate balance checks, and institutional analysis of whether manipulation is possible.


Further Reading

Essential

  • Cattaneo, Idrobo, and Titiunik (2020), A Practical Introduction to Regression Discontinuity Designs - Comprehensive modern textbook

  • Lee and Lemieux (2010), "Regression Discontinuity Designs in Economics" - Foundational survey

For Deeper Understanding

  • Imbens and Lemieux (2008), "Regression Discontinuity Designs: A Guide to Practice" - Classic methodological guide

  • Calonico, Cattaneo, and Titiunik (2014), "Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs" - Bias-correction and robust inference

  • Gelman and Imbens (2019), "Why High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs" - Warns against global polynomial fitting

Advanced/Specialized

  • McCrary (2008), "Manipulation of the Running Variable in the Regression Discontinuity Design" - The density test

  • Cattaneo, Jansson, and Ma (2020), "Simple Local Polynomial Density Estimators" - Modern density testing

  • Lee and Card (2008), "Regression Discontinuity Inference with Specification Error" - Discrete running variables

Applications

  • Lee (2008), "Randomized Experiments from Non-random Selection in U.S. House Elections" - The paradigmatic electoral RD

  • Carpenter and Dobkin (2009), "The Effect of Alcohol Consumption on Mortality" - Drinking age fuzzy RD

  • Dell (2010), "The Persistent Effects of Peru's Mining Mita" - Geographic RD with historical treatment


Exercises

Conceptual

  1. Explain why RD estimates are "local" and what this means for external validity. Under what circumstances might the local effect generalize to units further from the cutoff?

  2. Why is manipulation of the running variable a threat to RD validity? Give an example of a setting where manipulation is plausible and one where it is implausible.

  3. In fuzzy RD, what population is the treatment effect identified for? How does this relate to the IV concept of "compliers"?

Applied

  1. Using electoral data, replicate the Lee (2008) analysis of incumbency advantage. Produce RD plots, estimate treatment effects with rdrobust, and conduct the McCrary density test.

  2. Find a policy that uses a test score or age threshold. Implement an RD analysis including all validity checks discussed in this chapter.

Discussion

  1. Some researchers argue that RD provides the most credible quasi-experimental evidence short of actual experiments. Others argue that its local nature limits its usefulness for policy. Which view do you find more compelling, and why?


Appendix 14A: Technical Details of Local Polynomial Estimation

Kernel Weighting

Local polynomial regression can incorporate kernel weights that give more weight to observations closer to the cutoff:

τ^RD=argminα,τ,βi:XichK(Xich)(YiατDiβ(Xic)γDi(Xic))2\hat{\tau}_{RD} = \arg\min_{\alpha, \tau, \beta} \sum_{i: |X_i - c| \leq h} K\left(\frac{X_i - c}{h}\right) \left(Y_i - \alpha - \tau D_i - \beta(X_i - c) - \gamma D_i(X_i - c)\right)^2

Common kernels:

  • Triangular: K(u)=(1u)1[u1]K(u) = (1 - |u|) \cdot \mathbf{1}[|u| \leq 1]

  • Uniform (no weighting): K(u)=0.51[u1]K(u) = 0.5 \cdot \mathbf{1}[|u| \leq 1]

  • Epanechnikov: K(u)=0.75(1u2)1[u1]K(u) = 0.75(1 - u^2) \cdot \mathbf{1}[|u| \leq 1]

The triangular kernel is MSE-optimal for boundary estimation (the situation in RD).

Asymptotic Properties

Under regularity conditions, the local linear RD estimator has asymptotic distribution:

nh(τ^RDτRDh2B)dN(0,V)\sqrt{nh}(\hat{\tau}_{RD} - \tau_{RD} - h^2 B) \xrightarrow{d} N(0, V)

where BB is the bias term and VV is the asymptotic variance. The bias term depends on the second derivatives of the conditional expectation functions.

MSE-Optimal Bandwidth

The MSE-optimal bandwidth balances squared bias and variance:

hMSE=Cn1/5h^{MSE} = C \cdot n^{-1/5}

where CC depends on:

  • Curvature of E[YX]E[Y|X] at the cutoff (more curvature → smaller bandwidth)

  • Density of XX at the cutoff (higher density → smaller bandwidth)

  • Residual variance

Last updated