Chapter 17: When Point Identification Fails
Opening Question
What can we learn about causal effects when our identifying assumptions are too weak to pin down a single number?
Chapter Overview
The previous chapters have emphasized identification: conditions under which data reveal a causal parameter. Instrumental variables identify effects when exclusion restrictions hold; DiD identifies when parallel trends holds; RD identifies when there's no manipulation. But what if these assumptions are implausible? What if we're unwilling to assume what identification requires?
The traditional response is to abandon the question or make stronger assumptions. This chapter develops a third option: partial identification. Rather than accepting a dubious point estimate or giving up entirely, we can characterize the set of values consistent with weaker, more defensible assumptions.
Partial identification yields bounds rather than point estimates. The treatment effect might be anywhere from 0.05 to 0.25—a range that still provides useful information. Bounds are wider than point estimates but more honest about what the data can and cannot tell us.
What you will learn:
The logic of partial identification and when it's preferable to point identification
Manski bounds for missing data and selection
Lee bounds for sample selection in experiments
How sensitivity analysis connects to bounded identification
When bounds are informative and when they're too wide to be useful
The intellectual virtue of honest uncertainty
Prerequisites: Chapter 9 (Causal Framework), Chapter 11 (Selection on Observables), Chapter 12 (Instrumental Variables)
17.1 The Logic of Partial Identification
From Point to Partial
Consider estimating the effect of college on earnings. We observe:
Yiobs: observed earnings
Di: college attendance indicator
Xi: covariates
Under selection on observables (Chapter 11), we assume: Yi(0),Yi(1)⊥Di∣Xi
This is strong: it requires that, conditional on X, college attendance is as-if random. If unmeasured ability affects both college choice and earnings, this assumption fails.
Traditional response: Either (1) assume conditional independence anyway and report the biased estimate, or (2) find an instrument and impose exclusion restrictions, or (3) abandon causal interpretation entirely.
Partial identification response: What can we learn without assuming conditional independence? What bounds on the treatment effect are logically implied by the data and minimal assumptions?
The Identification Region
Definition 17.1 (Identification Region): The identification region Θ∗ is the set of parameter values consistent with the data and maintained assumptions: Θ∗={θ:data and assumptions are compatible with θ}
Point identification means Θ∗ contains a single value. Partial identification means Θ∗ is a set (interval, union of intervals, or more complex).
Why Partial Identification?
Intellectual honesty: Point estimates convey false precision when identifying assumptions are questionable. Bounds honestly represent uncertainty.
Assumption transparency: Partial identification makes assumptions visible. The width of bounds reveals how much identifying assumptions "buy."
Robustness: A policy conclusion that holds across the entire bounds interval is robust to identification concerns.
Decision-making: Bounds can still guide decisions. If the effect is positive across the entire interval, the policy implication is clear even without point identification.
17.2 Manski Bounds
The Missing Data Problem
The fundamental problem of causal inference is a missing data problem: we never observe both Yi(0) and Yi(1) for the same unit. Charles Manski's work formalizes what this missing data implies for identification.
Bounds on the Average Treatment Effect
Consider the simplest setting: we want to know E[Y(1)−Y(0)] but observe Yi(Di) where Di is treatment status.
We can write: E[Y(1)]=E[Y(1)∣D=1]P(D=1)+E[Y(1)∣D=0]P(D=0)
We observe E[Y∣D=1] and E[Y∣D=0]. Under no assumptions:
E[Y(1)∣D=1] is identified: it equals E[Y∣D=1]
E[Y(1)∣D=0] is not identified: the counterfactual outcome for the untreated
No-Assumptions Bounds
If Y is bounded in [YL,YU], then: E[Y(1)∣D=0]∈[YL,YU]
This yields bounds on E[Y(1)]: E[Y∣D=1]P(D=1)+YL⋅P(D=0)≤E[Y(1)]≤E[Y∣D=1]P(D=1)+YU⋅P(D=0)
Similarly for E[Y(0)]. The bounds on the ATE are:
Theorem 17.1 (Manski No-Assumptions Bounds): If Y∈[YL,YU], the ATE is bounded by: ΔL≤E[Y(1)−Y(0)]≤ΔU where: ΔL=E[Y∣D=1]−E[Y∣D=0]−(YU−YL)(1−P(D=1)) ΔU=E[Y∣D=1]−E[Y∣D=0]+(YU−YL)P(D=1)
Intuition: The worst case for identifying a positive effect is if untreated units would have had the highest possible outcomes under treatment, and treated units would have had the lowest possible outcomes under control.
Width of No-Assumptions Bounds
The bounds width is (YU−YL): the entire range of the outcome. For many applications, this is too wide to be informative.
Example: Estimating returns to college on earnings
YL=0 (can't have negative earnings)
YU=$1,000,000 (practical maximum)
P(D=1)=0.3 (30% attend college)
The no-assumptions bounds span nearly the entire earnings range—uninformative without additional restrictions.
Tightening Bounds with Assumptions
Monotone treatment response (MTR): Assume treatment doesn't hurt anyone: Yi(1)≥Yi(0) for all i
This cuts the bounds in half: negative treatment effects are ruled out.
Monotone treatment selection (MTS): Assume people who select treatment have weakly better untreated outcomes: E[Y(0)∣D=1]≥E[Y(0)∣D=0]
This rules out selection driven by low baseline outcomes.
Monotone instrumental variables (MIV): For an instrument Z, assume: E[Y(d)∣Z=z1]≥E[Y(d)∣Z=z2] when z1>z2
This imposes monotonicity in the instrument without requiring exclusion.
Box: Understanding MTR, MTS, and MIV—Returns to Education
These assumptions are subtle. A concrete example helps distinguish them.
Setting: We want to bound the returns to a college degree (D) on earnings (Y).
MTR
Yi(college)≥Yi(no college) for all i
College never hurts anyone's earnings. Plausible if education only adds skills, not if signaling crowds out experience.
MTS
E[Y(0)∥D=1]≥E[Y(0)∥D=0]
College-goers would earn more than non-goers even without college. Captures positive selection on ability—violated if low-ability people attend due to affirmative action.
MIV
E[Y(d)∥Z=z1]≥E[Y(d)∥Z=z2] when z1>z2
Using parental education as Z: people with more-educated parents have higher potential earnings at any education level. Does not assume parents' education only affects child earnings through child's own education.
Key distinction:
MTR restricts individual treatment effects (no one is harmed)
MTS restricts selection patterns (who chooses treatment)
MIV restricts how potential outcomes vary with an observable (without requiring exclusion)
Combining assumptions: Manski and Pepper (2000) show that combining MTR + MTS + MIV with parental education tightens returns-to-schooling bounds dramatically—from nearly [−100%, +100%] to approximately [6%, 15%].
The tradeoff: Tighter bounds require stronger assumptions. A researcher uncomfortable with MTR ("maybe some people learn best on the job") gets wider bounds. The partial identification framework makes this tradeoff explicit.
Each assumption tightens bounds. The researcher chooses which assumptions are credible and reports the resulting bounds.
Figure 17.1: How assumptions narrow the identification region. With no assumptions, bounds span a wide range. Adding MTR, MTS, or both progressively tightens the identified set. Combining multiple assumptions with an instrumental variable can approach point identification. The width of each bar shows the remaining uncertainty under that assumption set.
17.3 Lee Bounds for Sample Selection
The Problem
Experiments often suffer from differential attrition: treated and control groups have different dropout rates. If dropouts differ systematically from completers, comparing observed outcomes is biased.
Example: A job training RCT
Treatment: job training program
Control: no training
Outcome: employment at 12 months
Problem: 80% of treatment group completes follow-up, but only 60% of control group
The 80% vs. 60% difference could reflect the program helping people stay in the study (good) or the program selecting different types into the sample (bad for identification).
Lee (2009) Bounds
David Lee's bounds address sample selection by trimming the sample to make selection rates equal.
Key assumption: Treatment affects selection only by changing who is observed, not by adding "new types."
Assumption 17.1 (Monotonicity in Selection): For all units, Si(1)≥Si(0) (treatment never causes exit) or Si(1)≤Si(0) (treatment never causes retention).
Trimming procedure: If treatment increases retention (more treated observed than control), trim the treatment group to match the control selection rate:
Calculate the proportion excess: p=1−P(S=1∣D=0)/P(S=1∣D=1)
Trim p percent of treatment group—either the top or bottom of the outcome distribution
Lower bound: trim from above. Upper bound: trim from below.
Theorem 17.2 (Lee Bounds): Under monotonicity in selection, the ATE for always-observed units is bounded by: [YˉD=1trim,lower−YˉD=0,YˉD=1trim,upper−YˉD=0]
Intuition: We don't know which treated units are "marginal" (would have dropped out absent treatment). We bound the effect by assuming they have extreme outcomes (highest or lowest).
Example: Job Training Evaluation
Continuing the example:
Treatment completion: 80%
Control completion: 60%
Proportion excess: p=1−60/80=0.25
We trim 25% of the treatment group. For the lower bound, drop the top 25% of earners among treated. For the upper bound, drop the bottom 25%.
If observed mean earnings are:
Control: $25,000
Treatment: $30,000
Treatment (trim high): $27,000
Treatment (trim low): $33,000
Lee bounds: [$27,000 - $25,000, $33,000 - $25,000] = [$2,000, $8,000]
The point estimate without bounding is $5,000, comfortably inside the bounds.
When Are Lee Bounds Informative?
Lee bounds are informative when:
Selection rate differences are small
Outcome variation is moderate
Sample sizes are large enough for precise trimmed means
They are uninformative when:
Selection rates differ dramatically
Outcome distributions have fat tails
The trimmed portions contain most of the signal
17.4 Bounds in Instrumental Variables
Relaxing Exclusion
Standard IV requires the exclusion restriction: the instrument affects outcomes only through treatment. When this is questionable, we can partially identify effects by bounding the direct effect of the instrument.
Relaxed exclusion: Instead of assuming zero direct effect, assume the direct effect is bounded: ∣γ∣≤δ
where γ is the instrument's direct effect on the outcome.
Resulting bounds: β∈[Cov(D,Z)Cov(Y,Z)−∣Cov(D,Z)/Var(Z)∣δ,Cov(D,Z)Cov(Y,Z)+∣Cov(D,Z)/Var(Z)∣δ]
As δ→0, bounds collapse to the point estimate. As δ increases, bounds widen.
Bounds with Weak Instruments
Weak instruments create another identification problem. Rather than compute unreliable point estimates, we can report Anderson-Rubin confidence sets that remain valid regardless of instrument strength.
These confidence sets are bounds: they include all parameter values not rejected by the data.
17.5 Sensitivity Analysis as Partial Identification
The Connection
Sensitivity analysis (Chapter 11) asks: how much unmeasured confounding would be needed to explain away an estimated effect? This implicitly produces bounds.
Oster (2019) bounds: Compare how coefficients change when adding observed controls. Bound the effect of unobservables by extrapolating from observables.
Rosenbaum bounds: In matched studies, bound the treatment effect under varying levels of unmeasured confounding.
E-values: Report the minimum confounding strength needed to explain away the effect.
From Sensitivity to Bounds
A sensitivity analysis answers: "If confounding had strength Γ, what effects would be consistent with the data?"
This maps to partial identification:
For each Γ, compute the set of compatible effects
Union over plausible Γ values gives the identification region
Example: If the observed effect is β^=0.15 and sensitivity analysis shows:
Γ=1 (no confounding): β∈[0.10,0.20]
Γ=1.5: β∈[0.02,0.28]
Γ=2: β∈[−0.05,0.35]
If we're willing to assume Γ≤1.5, the bounds are [0.02,0.28]—positive throughout.
17.6 Proximal Causal Inference
Using Proxies for Unobserved Confounders
When confounders are unobserved, proxies may help. Proximal causal inference (Tchetgen Tchetgen et al., 2020) formalizes conditions under which proxies for unmeasured confounding enable identification or tighten bounds.
Setup:
U: unmeasured confounder
W: proxy for U (affected by U, not by D or Y directly)
Z: proxy for U (different from W)
Identification result: Under conditions relating proxies to the confounder, treatment effects can be identified or bounded even without observing U.
Intuition
If we have two proxies for the same confounder, each provides partial information about U. Together, they may provide enough information to control for U's confounding influence.
Example: Estimating effect of air pollution (D) on health (Y)
U: socioeconomic status (unmeasured)
W: neighborhood housing values (proxy for SES)
Z: car ownership (another proxy for SES)
Neither proxy perfectly measures SES, but together they may triangulate its confounding influence.
Limitations
Proximal causal inference requires:
Multiple proxies for the same confounder
Specific independence conditions
Sufficient proxy quality
When conditions are only approximately satisfied, the method produces bounds rather than point identification.
17.7 When to Report Bounds
The Trade-off
Point estimates are precise but may be wrong if assumptions fail. Bounds are honest about uncertainty but may be too wide for policy guidance.
Factors favoring bounds:
Identifying assumptions are questionable
Bounds are informative (narrow enough to guide decisions)
Audience values honesty over precision
Stakes are high enough to warrant extra caution
Factors favoring point estimates:
Assumptions are widely accepted
Bounds are so wide they're uninformative
The estimate is understood as one of many inputs
Consumers of research prefer precise (even if wrong) numbers
Reporting Strategy
Best practice: Report both.
Main point estimate under standard assumptions
Sensitivity analysis or bounds under weaker assumptions
Discussion of what assumptions are required for each
This lets readers with different credences in assumptions draw different conclusions.
Example: Returns to Education
Point estimate: IV using compulsory schooling laws finds ~8% return per year of schooling.
Concerns: Exclusion restriction (compulsory schooling may affect outcomes through channels other than years of schooling), LATE interpretation (effect for compliers may differ from population average).
Bounds approach:
Under monotonicity alone (education doesn't hurt earnings): [0,+∞)
Under monotonicity + bounded effect heterogeneity: [0.03,0.15]
Under relaxed exclusion (∣γ∣≤0.02): [0.05,0.11]
The bounds narrow as assumptions strengthen. Readers can choose which assumptions to believe.
17.8 Inference for Partially Identified Parameters
Confidence Intervals for Bounds
Bounds are estimated, not known. We need confidence intervals for the identification region itself.
Imbens and Manski (2004): Confidence intervals for interval-identified parameters: CI=[θ^L−c⋅se^L,θ^U+c⋅se^U]
where c is chosen to achieve correct coverage for the identified set, not just the endpoints.
Issues with Inference
Empty intersection: With multiple assumptions, their intersection may be empty in finite samples even if the true parameter is identified.
Conservative inference: Standard methods for bounds are conservative—actual coverage often exceeds nominal.
Bootstrap: Resampling methods can construct confidence sets for bounds, but require care with set-valued parameters.
17.9 Running Example: Returns to Education
The Identification Challenge
The returns to education question illustrates partial identification themes:
What we want: The causal effect of an additional year of schooling on earnings.
Why point identification is hard:
Selection: More able people get more schooling
Omitted variables: Family background, innate ability, motivation
Measurement: Years of schooling may not capture quality
IV approach (Chapter 12): Use compulsory schooling laws as instruments. But:
Exclusion restriction is questionable (schooling laws may affect outcomes through peer effects, credential effects)
LATE is for compliers, not the population
Bounds Analysis
No-assumptions bounds: If we only assume earnings are non-negative and bounded by $1 million, bounds span nearly the entire range—uninformative.
Monotonicity bounds: Assuming more education doesn't reduce earnings:
Lower bound: 0 (education has no positive effect)
Upper bound: Observed college premium (upper bound on the causal effect)
This tells us: the effect is non-negative but could be anywhere from zero to the observed correlation.
IV bounds with relaxed exclusion: Following Conley et al. (2012), allow the instrument to have a small direct effect:
If ∣γ∣≤0.01, bounds are approximately [0.06,0.10]
If ∣γ∣≤0.03, bounds are approximately [0.03,0.13]
Sensitivity analysis (Altonji et al. 2005, Oster 2019):
How much would selection on unobservables need to exceed selection on observables to explain away the return?
Estimates suggest selection would need to be 2-3x stronger—plausibly too strong
What We Learn
Combining approaches:
Returns to education are almost certainly positive
Plausible range: 3-15% per year
Point estimate of ~8% is within the plausible range
Uncertainty is real but doesn't reverse conclusions
This is more honest than reporting "8% ± 2%" when the true uncertainty is much larger.
Practical Guidance
When to Use Partial Identification
Point-identifying assumptions are credible
Report point estimate + sensitivity analysis
Some assumptions questionable
Report bounds under alternative assumptions
Major assumption clearly false
Use bounds or weakest defensible assumptions
Policy decision required
Report bounds; check if decision is robust
Academic credibility paramount
Report both; let readers choose assumptions
Common Pitfalls
Pitfall 1: Dismissing bounds as uninformative Wide bounds still provide information—they tell you what you don't know. "The effect is somewhere between -0.1 and 0.5" is more honest than "the effect is 0.2 ± 0.05 (assuming my model is correct)."
How to avoid: Report bounds even when wide. Discuss what additional assumptions or data would narrow them.
Pitfall 2: Cherry-picking bounds Choosing assumptions to get narrow bounds defeats the purpose.
How to avoid: Report bounds under multiple assumption sets. Be transparent about which assumptions you find most credible and why.
Pitfall 3: Ignoring sampling uncertainty in bounds Bounds are estimated quantities with their own standard errors.
How to avoid: Report confidence intervals for bounds endpoints. Use methods designed for partially identified parameters.
Pitfall 4: Confusing identification failure with estimation failure Wide bounds may reflect weak identification (a data limitation) or weak assumptions (a modeling choice). These have different implications.
How to avoid: Distinguish between "the data are insufficient" and "I'm unwilling to assume enough."
Implementation Checklist
Qualitative Bridge
The Value of Honest Uncertainty
Partial identification embodies a commitment to honest uncertainty—acknowledging the limits of what data can tell us. This connects to qualitative research traditions that:
Emphasize the complexity of social phenomena
Resist oversimplified causal claims
Value thick description over point estimates
When to Combine
Understanding assumptions: Qualitative knowledge helps assess which identifying assumptions are credible. Fieldwork, interviews, and institutional analysis reveal whether exclusion restrictions hold, whether selection is monotone, whether treatment effects are bounded.
Interpreting bounds: Wide bounds indicate genuine uncertainty, but don't tell us why identification is hard. Qualitative analysis can explain the sources of confounding and suggest what additional data or design would narrow bounds.
Communicating uncertainty: Policy audiences may find bounds confusing. Case studies and narrative can convey what uncertain estimates mean for real decisions.
Example: Evaluating Education Interventions
Bounds on educational intervention effects may span positive and negative. Qualitative evidence helps interpret this:
Process observation: Does the intervention seem to work in classrooms?
Teacher interviews: What are implementation challenges?
Student focus groups: How do students experience the intervention?
This evidence doesn't narrow statistical bounds but helps judge where in the bounds the true effect likely lies.
Integration Note
Connections to Other Methods
Selection on Observables
Sensitivity analysis produces bounds
Ch. 11
Instrumental Variables
Relaxed exclusion yields bounds
Ch. 12
RDD
Extrapolation from cutoff produces bounds
Ch. 14
DiD
Sensitivity to parallel trends gives bounds
Ch. 13
Triangulation Strategies
Bounds from different methods may overlap, reinforcing conclusions:
Different bounding assumptions: Do bounds from MTR, MTS, and MIV intersect?
Different research designs: Do IV bounds overlap with selection-on-observables bounds?
Different datasets: Are bounds consistent across data sources?
Overlap of multiple bounds provides stronger evidence than any single bound.
Summary
Key takeaways:
Partial identification produces bounds instead of point estimates when identifying assumptions are too weak for point identification.
Manski bounds show what can be learned from data alone (often very little) and how monotonicity or other assumptions tighten bounds.
Lee bounds address sample selection in experiments by trimming to equalize selection rates, under monotonicity assumptions.
Sensitivity analysis is implicit partial identification—it maps assumptions about confounding strength to sets of compatible effects.
Bounds provide honest uncertainty about causal effects. They're wider than point estimates but don't rely on questionable assumptions.
Report both when possible: point estimates under standard assumptions, bounds under weaker assumptions. Let readers choose their credence.
Returning to the opening question: When identifying assumptions are too weak to pin down a single number, we can still learn from data. Partial identification characterizes the set of parameter values consistent with the data and minimal assumptions. These bounds may be wide, but they honestly represent what we know—and don't know—about causal effects.
Further Reading
Essential
Manski (2003), Partial Identification of Probability Distributions - The foundational treatment
Tamer (2010), "Partial Identification in Econometrics" - Accessible survey
For Deeper Understanding
Manski (1990), "Nonparametric Bounds on Treatment Effects" - Original treatment effect bounds
Lee (2009), "Training, Wages, and Sample Selection" - Lee bounds derivation and application
Imbens and Manski (2004), "Confidence Intervals for Partially Identified Parameters" - Inference methods
Advanced/Specialized
Conley, Hansen, and Rossi (2012), "Plausibly Exogenous" - Bounds with imperfect instruments
Tchetgen Tchetgen et al. (2020), "Introduction to Proximal Causal Inference" - Proxies for confounders
Molinari (2020), "Microeconometrics with Partial Identification" - Comprehensive treatment
Applications
Manski and Pepper (2000), "Monotone Instrumental Variables" - Returns to schooling bounds
Blundell et al. (2007), "Changes in the Distribution of Male and Female Wages" - Bounds on wage distribution changes
Kline and Santos (2012), "A Score Based Approach to Wild Bootstrap Inference" - Inference for bounds
Exercises
Conceptual
Explain the difference between point identification and partial identification. When is partial identification preferable to (a) imposing additional assumptions for point identification, or (b) abandoning causal inference entirely?
In Lee bounds, what is the monotonicity assumption? Construct an example where this assumption fails and explain why Lee bounds would be invalid.
How does sensitivity analysis relate to partial identification? Show how Oster (2019) bounds can be reframed as partial identification under assumptions about selection.
Applied
Using data from a job training program evaluation:
Calculate naive treatment effect estimates
If there is differential attrition, compute Lee bounds
Discuss what the bounds tell you about the program's effectiveness
For the returns to education question:
State Manski's no-assumptions bounds (given plausible outcome bounds)
Add monotonicity (education doesn't reduce earnings) and compute tighter bounds
Discuss whether these bounds are narrow enough to be policy-relevant
Discussion
A policymaker says: "I need a number, not a range. Bounds are useless for decision-making." How would you respond? When are bounds useful for policy decisions, and when are they genuinely uninformative?
Last updated