Chapter 23: Triangulation and Multi-Method Design
Opening Question
When different methods give different answers to the same research question, what should we conclude?
Chapter Overview
Empirical research rarely produces certainty. Every method has limitations, every dataset has gaps, and every identification strategy relies on assumptions that cannot be fully tested. A natural response to this uncertainty is to ask: can we strengthen our conclusions by combining multiple approaches?
This chapter examines how researchers can strategically combine different methods to build stronger evidence than any single approach would provide. The key insight is that triangulation works not because more evidence is always better, but because different methods fail in different ways. When diverse approaches converge on similar conclusions despite different assumptions and data sources, our confidence increases. When they disagree, the disagreement itself often reveals something important about the phenomenon under study.
What you will learn:
Why combining methods can strengthen evidence even when each method is imperfect
How to design sequential and concurrent multi-method studies
How to diagnose and interpret disagreements between methods
When triangulation is most valuable and when it adds complexity without insight
Prerequisites: Familiarity with the major identification strategies (Chapters 9-16)
Historical Context: The Origins of Triangulation
The term "triangulation" comes from surveying and navigation, where determining position from a single reference point is impossible but straightforward with two or more. The methodological analogy entered social science in the 1970s, primarily through the work of Norman Denzin (1970) and Donald Campbell.
Campbell's influence is particularly important for empirical economics. His 1957 paper with Fiske introduced the "multitrait-multimethod matrix" for psychological measurement, arguing that convergent results across different measurement approaches strengthened validity claims. In his later work on quasi-experimental design, Campbell (1974) emphasized "methodological triangulation" as a way to compensate for the threats to validity inherent in any single research design.
The credibility revolution in economics initially moved away from triangulation toward single-study, design-based credibility. Angrist and Pischke (2010) famously argued that clean identification in one well-designed study trumps messy accumulation across many. But as economists confronted questions that resist clean identification---the effects of institutions, the causes of development, the consequences of major policy reforms---interest in strategic combination of methods has revived.
Today's best empirical practice often combines the rigor of the credibility revolution with older traditions of evidence accumulation. The minimum wage debate, for instance, now involves dozens of studies using different designs, with meta-analysis and specification curves attempting to synthesize the evidence (Dube 2019).
23.1 Why Combine Methods?
The Logic of Triangulation
The fundamental logic of triangulation rests on a simple observation: different methods make different assumptions. If two methods share the same critical assumption and both could be biased in the same direction, combining them adds little. But if they rely on orthogonal identifying assumptions, convergent results provide stronger evidence than either alone.
Principle 23.1: Strength Through Diversity Triangulation strengthens evidence when the methods combined have different sources of potential bias. If method A is vulnerable to bias X but robust to bias Y, and method B is vulnerable to Y but robust to X, convergent results from both suggests neither X nor Y is driving the finding.
Intuition: Think of it as ruling out alternative explanations. Each method rules out some threats but not others. If we can assemble methods that collectively rule out all major threats, the common finding is hard to explain away.
Types of Triangulation
Denzin (1978) distinguished four types of triangulation, which remain useful for empirical economics:
1. Data Triangulation Using multiple data sources to examine the same phenomenon. For instance, studying the effects of education using both administrative records and survey data, or examining minimum wage effects using establishment surveys (like the CES) and household surveys (like the CPS).
2. Methodological Triangulation Combining different analytical approaches---the focus of this chapter. This includes combining quantitative methods (e.g., IV and DiD), combining quantitative and qualitative approaches, or using both experimental and observational evidence.
3. Investigator Triangulation Having multiple researchers independently analyze the data or conduct studies. This addresses researcher degrees of freedom and reduces the influence of any individual's priors.
4. Theory Triangulation Examining data from multiple theoretical perspectives. Different theoretical frameworks may generate different testable predictions, and findings that hold across frameworks are more robust.
When Triangulation Adds Value
Triangulation is most valuable when:
No single method is decisive. For questions like "what caused China's growth?" or "how does monetary policy affect the economy?", no individual study can definitively answer the question. Cumulative evidence is necessary.
Stakes are high and errors are costly. Policy decisions with large consequences warrant investment in multiple approaches. Whether to raise the minimum wage nationally deserves more scrutiny than a single study can provide.
Methods have complementary strengths. RCTs provide strong internal validity but often limited external validity. Observational methods across many contexts provide breadth but uncertain causality. Combining them leverages both strengths.
Prior evidence is contested. When literature is mixed, systematic combination of approaches can diagnose why studies disagree.

23.2 Sequential Designs
Sequential multi-method designs use findings from one phase to inform the next. The sequence may move from qualitative to quantitative, from quantitative to qualitative, or iterate between them.
Qualitative-to-Quantitative Sequences
Phase 1: Qualitative exploration
Identify relevant variables and mechanisms
Understand institutional context
Generate hypotheses for testing
Phase 2: Quantitative testing
Test hypotheses with statistical methods
Estimate magnitudes and precision
Assess generalizability
This sequence is valuable when the research question is new or when existing theory provides little guidance. The qualitative phase helps ensure the quantitative analysis asks the right questions.
Example: Understanding Microfinance Impacts
The microfinance literature illustrates this sequence well. Early ethnographic and case study work (e.g., Yunus 1999; Morduch 1999) identified potential mechanisms---consumption smoothing, female empowerment, microenterprise growth---and raised questions about selection into borrowing.
This qualitative work shaped the design of subsequent RCTs. Banerjee et al. (2015) explicitly measured outcomes that the qualitative literature had identified as potentially important, including business investment, consumption, female decision-making power, and health. The RCTs also addressed the selection concerns raised by early critiques.
Quantitative-to-Qualitative Sequences
The reverse sequence uses quantitative findings to motivate targeted qualitative investigation:
Phase 1: Quantitative analysis
Identify patterns, estimate effects
Identify puzzles or heterogeneity
Phase 2: Qualitative investigation
Explain mechanisms behind statistical patterns
Understand heterogeneity
Identify boundary conditions
Example: Why Did the Minimum Wage Not Reduce Employment?
Card and Krueger's (1994) finding that New Jersey's minimum wage increase did not reduce fast-food employment was initially puzzling given standard economic theory. This quantitative puzzle motivated qualitative investigation:
How did employers actually respond? (Interviews revealed price increases, reduced turnover, slight productivity increases)
Why were fast-food restaurants different? (Case studies showed monopsonistic features, high turnover costs)
What margins adjusted? (Field observation showed hours, scheduling, and working conditions changed)
This qualitative follow-up enriched understanding beyond what the quasi-experimental estimates alone could provide.
Iterative Designs
More complex sequences iterate between phases:
Qualitative exploration → 2. Quantitative testing → 3. Qualitative follow-up on puzzles → 4. Refined quantitative analysis
This iterative approach is common in studying complex phenomena like economic development, where initial findings generate new questions that require further investigation.
Practical Guidance: Sequential Design
Qualitative first
Develop theory, identify variables, understand context
Premature quantification, confirmation bias in interpretation
Quantitative first
Establish patterns, estimate effects
Missing mechanisms, misspecified outcomes
Iteration
Refine understanding, test mechanisms
Study fatigue, post-hoc rationalization
23.3 Concurrent Designs
Concurrent designs conduct different approaches simultaneously, comparing or integrating findings at the analysis stage.
Parallel Investigation
Different methods are applied independently to the same research question:
Design features:
Same question, same population, same time period
Different identification strategies
Independent analysis (to avoid contamination)
Comparison at integration phase
Example: China's Special Economic Zones
Research on China's Special Economic Zones illustrates parallel investigation. Multiple methods have been applied to assess SEZ impacts:
Method 1: Difference-in-Differences Wang (2013) uses the staggered introduction of SEZs across Chinese cities, comparing growth in newly designated SEZ cities to cities not yet designated. Parallel trends are tested using pre-SEZ periods.
Method 2: Synthetic Control Alder, Shao, and Zilibotti (2016) construct synthetic counterfactual cities for SEZ recipients using weighted combinations of non-SEZ cities matched on pre-reform characteristics.
Method 3: Geographic RD Studies exploiting the geographic boundaries of SEZs compare outcomes just inside versus just outside SEZ borders, using distance to the boundary as the running variable.
Method 4: Qualitative Case Studies Detailed case studies of individual SEZs (e.g., Shenzhen) document the actual mechanisms of reform---policy experimentation, foreign investment patterns, infrastructure development.
When these methods---with quite different identifying assumptions---yield similar conclusions about substantial positive effects, confidence increases. Where they disagree (e.g., on the magnitude of spillovers to non-SEZ regions), the disagreement helps identify where uncertainty remains.
Cross-Method Validation
A specific form of concurrent design uses one method to validate another:
Definition 23.2: Cross-Method Validation Using the results from one method to check the plausibility of assumptions required by another.
Note: We use "cross-method validation" rather than "cross-validation" to avoid confusion with the machine learning technique of the same name (splitting data into folds for model selection). Here, we're validating across methods, not across data partitions.
Intuition: If a DiD study relies on parallel trends, qualitative evidence about the comparability of treatment and control groups can support or undermine that assumption. If an IV study claims a particular mechanism, direct evidence on that mechanism (qualitative or quantitative) can support or undermine the exclusion restriction.
Example: Validating IV Exclusion Restrictions
Angrist and Krueger's (1991) quarter-of-birth instrument for education relies on the assumption that birth timing affects earnings only through education. This exclusion restriction cannot be tested statistically but can be investigated:
Qualitative evidence: Are there other channels? Seasonal patterns in health, family background differences by season of birth?
Additional quantitative tests: Do quarter-of-birth effects appear in samples where compulsory schooling binds versus doesn't bind?
Alternative instruments: Do draft lottery instruments (which have different exclusion restrictions) yield similar estimates?
Bound, Jaeger, and Baker (1995) used this cross-method validation logic to challenge the original findings, showing that quarter-of-birth had weak predictive power for education and that similar patterns appeared for cohorts not subject to compulsory schooling laws.
Embedded Designs
Some studies embed one approach within another:
Quantitative study with qualitative component:
RCT with implementation study
Survey with open-ended questions
Statistical analysis with case selection for follow-up
Qualitative study with quantitative component:
Case study with systematic coding
Ethnography with survey validation
Historical analysis with quantitative measurement
Integration Approaches
Convergence
Do methods agree? Report common finding
When testing robustness
Complementarity
What does each add? Combine insights
When understanding mechanisms
Expansion
Does one extend the other? Breadth + depth
When assessing external validity
Contradiction
Why do they disagree? Diagnose
When literature is contested
23.4 When Methods Disagree
Perhaps the most intellectually productive situation in multi-method research is when different approaches yield different answers. Disagreement forces careful thinking about what each method actually identifies.
Sources of Disagreement
1. Different estimands Methods may actually be estimating different quantities. An RCT estimates the effect of treatment assignment (ITT) or complier effects (LATE). An observational study with strong selection may estimate effects for different populations. What looks like disagreement may be heterogeneity.
Worked Example: Minimum Wage Effects by Method
Consider three estimates of minimum wage effects on employment:
Card-Krueger DiD (1994): Elasticity ≈ 0.0 (null)
Cross-state time series regressions: Elasticity ≈ -0.1 to -0.3
Meta-analysis of all studies: Wide distribution centered near -0.1
Are these contradictory? The DiD compares adjacent counties, identifying local effects with limited spillovers. Time series regressions capture aggregate effects including general equilibrium adjustments. The estimates may differ because the estimands differ, not because one method is wrong.
2. Different identifying assumptions (and violations) If one method's key assumption is violated, it will be biased. Disagreement may reveal which assumptions hold. For instance, if a DiD study and a synthetic control study of the same policy disagree, this may indicate that parallel trends fails in one case or that the synthetic control match is poor.
3. Different data sources Studies using different data may capture different populations or measurement concepts. Administrative data and survey data on employment may not align because they measure different things (jobs vs. workers) or because of measurement error differences.
4. Statistical noise With realistic sample sizes and effect sizes, random sampling variation can produce seemingly conflicting results. Two well-executed studies can disagree simply due to chance.
Diagnosing Disagreement
A systematic approach to interpreting disagreement:
Step 1: Clarify estimands
What exactly does each method estimate?
Could differences reflect heterogeneity rather than bias?
Step 2: Assess identifying assumptions
What assumptions does each method require?
Is there evidence for or against each?
Could violation explain the pattern of disagreement?
Step 3: Examine data differences
Do the studies use the same data?
Are there measurement differences?
Are the populations comparable?
Step 4: Quantify sampling uncertainty
Are the results actually statistically distinguishable?
What would a formal test of equality conclude?
Example: Why Do SVAR and LP Give Different Monetary Policy Effects?
Structural VAR models and local projections sometimes yield different impulse response estimates for monetary policy shocks. Systematic investigation has revealed:
Estimand differences: In small samples, VAR and LP estimate slightly different weighted averages of dynamic effects.
Assumption differences: VAR requires correct lag specification for the entire system; LP is robust to misspecification but less efficient.
Treatment of nonlinearity: LP more easily accommodates state-dependence; VAR typically imposes linearity.
Identification: Same external instrument can be used in both, but LP is more robust to instrument weakness.
Plagborg-Moller and Wolf (2021) show that under correct specification, both methods converge, but in practice their differences can help diagnose which specification issues matter.
Productive Use of Disagreement
Principle 23.2: Disagreement as Information When methods disagree, the pattern of disagreement often conveys information. Rather than treating disagreement as a problem to resolve, treat it as data about the phenomenon.
Strategies for extracting information from disagreement:
Bound the truth: If methods with opposite biases disagree, the truth may lie between them.
Identify moderators: If methods agree for some subgroups but not others, heterogeneity is revealed.
Test mechanisms: If a proposed mechanism predicts which methods should agree/disagree, the pattern tests the mechanism.
Update beliefs: Bayesian researchers explicitly update priors differently based on method-specific biases.
Box: A Framework for Weighting Evidence Across Methods
When multiple studies using different methods address the same question, how should we weight them? Here is a structured approach:
Step 1: Assess each method's internal validity
Design quality
Is treatment plausibly exogenous? Are assumptions testable?
Statistical precision
What are confidence intervals? Sample size?
Robustness
Do results hold across specifications?
Transparency
Are methods clearly documented? Data accessible?
Step 2: Consider method-specific biases
RCT
Attrition, Hawthorne effects
Low attrition, blind design
IV
Weak instruments, exclusion violations
Strong F-stat, credible exclusion
DiD
Parallel trends failure
Pre-trends pass, multiple controls
RD
Manipulation, wrong bandwidth
Dense running variable, no bunching
Observational
Omitted variables
Rich controls, sensitivity analysis
Step 3: Consider estimand differences
Methods may estimate different quantities:
RCT: ITT or LATE (for compliers)
IV: LATE (different compliers than RCT if different instrument)
DiD: ATT for treated group
RD: Local effect at threshold
Apparent disagreement may reflect genuine heterogeneity. Weight disagreement more informative if estimands align.
Step 4: Apply structured weighting
Option A: Qualitative hierarchy Give most weight to designs with clearest identification, but don't dismiss others. Treat conflicting evidence as bounding.
Option B: Bayesian updating Start with prior. Update more strongly for high-quality studies. Explicitly model the probability each study is biased.
Option C: Pre-specified weighting Before seeing results, specify how you will weight methods (e.g., "RCT evidence weighted 3x observational"). Prevents post-hoc rationalization.
The key principle: Make your weighting explicit. Hidden weighting (e.g., emphasizing studies that support your prior) is a major source of bias in literature reviews.
23.5 Examples of Successful Integration
Case Study 1: Microfinance Impacts
The microfinance literature exemplifies successful multi-method integration:
Qualitative phase (1990s-2000s)
Ethnographic work documented how borrowers used loans
Case studies identified potential mechanisms (female empowerment, consumption smoothing, business investment)
Raised concerns about selection bias in early quantitative work
Experimental phase (2005-2015)
Six-country RCTs provided experimental estimates
Modest positive effects on investment
Limited effects on consumption or poverty
No evidence of transformative impacts
Meta-analytic synthesis (2015-present)
Meager (2019) pools RCTs using Bayesian hierarchical models
Quantifies cross-site heterogeneity
Shows modest mean effects are robust
Large variance implies effect sizes vary substantially by context
Qualitative follow-up
Why modest effects? Returns to microenterprise are low
Why heterogeneity? Local market conditions, loan product design, borrower characteristics
Current consensus: Microfinance produces modest positive effects on average, with substantial heterogeneity. It is not transformative for poverty reduction but provides value as financial access. This consensus emerged from triangulation across methods.
Case Study 2: Returns to Education
The returns to education literature demonstrates decades of productive triangulation:
OLS baseline: Mincer regressions suggest ~10% return per year of schooling
IV approaches: Draft lottery, compulsory schooling, geographic proximity instruments yield similar or higher returns (often 10-15%), addressing selection bias
Bounds analysis: Manski bounds without instruments are wide; adding weak assumptions narrows them to ranges consistent with IV estimates
Heterogeneity analysis: Causal forests and ML methods show returns vary by ability, family background, field of study
Qualitative work: Case studies of educational decisions, labor market matching, credential effects
Convergence: Despite very different assumptions, methods converge on positive returns in the 8-15% range, with substantial heterogeneity. This convergence across orthogonal approaches is powerful evidence that education causally increases earnings.
Case Study 3: Understanding China's Growth
China's post-1978 growth remains debated, but multi-method research has clarified key facts:
Growth accounting: Documents extraordinary productivity growth, though measurement debates continue
Quasi-experimental: SEZ studies show positive effects of liberalization; DiD and synthetic control estimates are broadly consistent
Macro time series: Structural breaks, regime-switching models document changing growth dynamics
Comparative case analysis: Comparison with other transition economies highlights what's distinctive about China
Institutional analysis: Qualitative work documents specific reform mechanisms (township-village enterprises, dual-track pricing, experimentation)
Integration: No single study answers "what caused China's growth?" but triangulation reveals: (1) productivity growth was real; (2) market-oriented reforms had large positive effects; (3) gradualism and experimentation mattered; (4) initial conditions were favorable; (5) simple explanations (all institutions, all geography, all policy) are inadequate.

Practical Guidance
When to Use Triangulation
High-stakes policy question
High
Worth investment in multiple approaches
No single method is decisive
High
Cumulative evidence necessary
Literature is contested
High
Can diagnose why studies disagree
Clear experimental answer available
Low
Single well-identified study may suffice
Resources constrained
Low
Better to do one method well than two poorly
Methods share assumptions
Low
Adds complexity without robustness
Common Pitfalls
Pitfall 1: Triangulation by Accumulation Piling up studies without attention to their identifying assumptions provides false confidence. Ten studies with the same bias are not better than one.
How to avoid: Explicitly map each method's assumptions and biases. Ensure methods are complementary, not redundant.
Pitfall 2: Selective Weighting Giving more weight to methods that support preferred conclusions. Researchers unconsciously favor confirmatory evidence.
How to avoid: Pre-specify how methods will be weighted. Have others independently assess method quality.
Pitfall 3: Forced Consistency Interpreting away disagreement rather than taking it seriously. "Well, if we adjust for X, the studies agree" can be post-hoc rationalization.
How to avoid: Report disagreements prominently. Investigate their sources systematically. Accept that uncertainty may remain.
Pitfall 4: Complexity Without Insight Multi-method research is costly. If integration doesn't yield insights beyond single methods, the complexity isn't justified.
How to avoid: Ask at design stage: what will we learn from combining that we couldn't learn from each method alone?
Implementation Checklist
Qualitative Bridge
Mixed Methods in Practice
The combination of qualitative and quantitative approaches deserves particular attention. Each tradition has strengths the other lacks:
Quantitative strengths: Precise measurement, large samples, formal inference, generalizability Qualitative strengths: Rich description, mechanism discovery, context sensitivity, unexpected findings
Productive Combinations
Qualitative for mechanism, quantitative for magnitude: Use case studies to understand how an intervention works, RCTs or quasi-experiments to measure how much.
Quantitative for patterns, qualitative for meaning: Statistical analysis reveals what varies with what; qualitative work reveals why and what it means to participants.
Qualitative for external validity: RCTs establish what happens in study sites; qualitative assessment of context helps judge whether findings transport.
Example: Cash Transfers and Poverty
Research on cash transfers combines:
RCTs measuring impacts on consumption, education, health
Ethnographic work on how recipients actually use transfers
Process studies of implementation challenges
Long-term follow-up using mixed methods
This combination reveals not just that cash transfers help but how, for whom, and under what conditions---questions no single method could answer.
Integration Note
Connections to Other Methods
Experimental methods
RCTs often serve as internal validity anchor
Ch. 10
Meta-analysis
Formal method for combining quantitative studies
Ch. 24
Sensitivity analysis
Tests robustness to assumption violations
Ch. 11, 17
Process tracing
Qualitative method for mechanism investigation
Ch. 19
Building Evidence
Triangulation is inherently connected to the broader questions of how evidence accumulates (Chapter 24) and how research should be practiced (Chapter 25). Multi-method research is most valuable when it's part of a cumulative research program, with each study building on and extending previous work.
Summary
Key takeaways:
Triangulation strengthens evidence when methods have different assumptions and potential biases; combining methods with shared assumptions adds complexity without robustness.
Sequential designs use findings from one method to inform the next; concurrent designs compare results at the integration stage; both can yield insights beyond single-method studies.
Disagreement between methods is information, not just noise. Systematic diagnosis of disagreement often reveals heterogeneity, boundary conditions, or the limits of current knowledge.
Returning to the opening question: When different methods give different answers, the appropriate response is neither to dismiss the disagreement nor to arbitrarily pick one method as correct. Instead, investigate the sources of disagreement systematically. The disagreement pattern often reveals something important---whether that's estimand heterogeneity, assumption violations, data differences, or genuine uncertainty about the phenomenon.
Further Reading
Essential
Seawright, J. (2016). "Multi-Method Social Science: Combining Qualitative and Quantitative Tools." Cambridge University Press.
Humphreys, M. and A. Jacobs (2015). "Mixing Methods: A Bayesian Approach." American Political Science Review.
For Deeper Understanding
Creswell, J. and V. Plano Clark (2017). "Designing and Conducting Mixed Methods Research." 3rd ed.
Lieberman, E. (2005). "Nested Analysis as a Mixed-Method Strategy for Comparative Research." American Political Science Review.
Advanced/Specialized
Campbell, D.T. and D. Fiske (1959). "Convergent and Discriminant Validation by the Multitrait-Multimethod Matrix." Psychological Bulletin.
Rosenbaum, P. (2010). "Design of Observational Studies." Springer. [On design-based triangulation]
Applications
Banerjee, A., E. Duflo, R. Glennerster, and C. Kinnan (2015). "The Miracle of Microfinance?" AEJ: Applied.
Dube, A. (2019). "Minimum Wages and the Distribution of Family Incomes." American Economic Journal: Applied Economics.
Ang, Y.Y. (2016). "How China Escaped the Poverty Trap." Cornell University Press.
Exercises
Conceptual
A researcher finds that an instrumental variables estimate and a regression discontinuity estimate for the same program yield significantly different effect sizes. List three potential explanations for this disagreement and describe how you would investigate each.
Under what conditions would you not recommend triangulation for a policy evaluation? Explain your reasoning.
Applied
Select a policy question you care about. Identify three distinct methods that could address it. For each, (a) state the key identifying assumption, (b) describe the main threat to validity, and (c) explain whether the methods' assumptions are orthogonal or overlapping.
The minimum wage literature contains dozens of studies with varying results. Design a structured comparison of three minimum wage studies using different methods. Create a table mapping each study's estimand, identifying assumption, and main findings.
Discussion
Some argue that the credibility revolution's emphasis on single-study causal identification has reduced interest in multi-method research. Do you agree? What are the tradeoffs between investing in one methodologically rigorous study versus multiple less rigorous studies?
Last updated