Chapter 24: Evidence Synthesis

Opening Question

With dozens of studies on the same question often reaching different conclusions, how should we combine evidence to draw reliable inferences?


Chapter Overview

A mature research literature contains not one study but many, each with different samples, methods, and results. The minimum wage literature includes hundreds of studies. The returns to education literature spans decades and dozens of countries. The microfinance literature now includes multiple RCTs across different contexts. How should we combine this evidence?

This chapter examines formal methods for synthesizing evidence across studies. Meta-analysis provides tools for pooling estimates while accounting for heterogeneity. Systematic review offers structured approaches for comprehensively evaluating literature. And modern tools---specification curves, multiverse analysis, pre-registration---help address the replication crisis by making research more transparent and robust.

The core insight is that naive synthesis (counting studies, simple averaging) can mislead. Publication bias distorts what gets published. Study heterogeneity means not all estimates address the same question. Quality differences mean not all studies deserve equal weight. Careful synthesis methods address these challenges.

What you will learn:

  • How to conduct and interpret meta-analyses

  • How to detect and correct for publication bias

  • How to design and execute systematic reviews

  • How specification curves and multiverse analysis reveal researcher degrees of freedom

Prerequisites: Familiarity with regression methods (Chapter 3), identification strategies (Chapters 9-17)


Historical Context: The Rise of Evidence Synthesis

Meta-analysis---the statistical combination of results from multiple studies---was pioneered by Gene Glass in psychology in the 1970s. Glass coined the term in 1976, defining meta-analysis as "the statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the findings."

In medicine, the Cochrane Collaboration (founded 1993) institutionalized systematic review, developing rigorous protocols that became gold standards for evidence-based medicine. The Campbell Collaboration (founded 2000) extended these methods to social science.

Economics came to meta-analysis relatively late. Stanley and Jarrell's 1989 "Meta-Regression Analysis" introduced economists to the approach, but widespread adoption came in the 2000s. The credibility revolution initially emphasized single-study identification over accumulation, but recent years have seen renewed interest in synthesis---partly driven by concerns about replication failures and partly by the accumulation of multiple well-identified studies on important questions.

The replication crisis that emerged in psychology in the 2010s (Open Science Collaboration 2015) spurred new tools: pre-registration, registered reports, specification curves, and multiverse analysis. Economics has adapted these tools while debating their applicability to economics' different research context.


24.1 Meta-Analysis Basics

The Logic of Pooling

Meta-analysis combines estimates from multiple studies to produce a summary effect size. The basic logic is simple: averaging reduces sampling error. If each study provides a noisy estimate of a true effect, combining them should yield a more precise estimate.

Definition 24.1: Meta-Analytic Estimate Given kk studies with effect estimates θ^i\hat{\theta}_i and standard errors σi\sigma_i, a weighted average estimator is: θ^MA=i=1kwiθ^ii=1kwi\hat{\theta}_{MA} = \frac{\sum_{i=1}^k w_i \hat{\theta}_i}{\sum_{i=1}^k w_i} where weights wiw_i are typically inverse variance weights: wi=1/σi2w_i = 1/\sigma_i^2.

Intuition: More precise studies get more weight. A study with a standard error of 0.05 gets four times the weight of a study with standard error 0.10.

Fixed Effects vs. Random Effects

The choice of meta-analytic model depends on assumptions about heterogeneity across studies.

Fixed Effects Model Assumes all studies estimate the same true effect θ\theta. Differences in estimates arise only from sampling error: θ^i=θ+ϵi,ϵiN(0,σi2)\hat{\theta}_i = \theta + \epsilon_i, \quad \epsilon_i \sim N(0, \sigma_i^2)

The fixed effects pooled estimate is: θ^FE=wiθ^iwi,Var(θ^FE)=1wi\hat{\theta}_{FE} = \frac{\sum w_i \hat{\theta}_i}{\sum w_i}, \quad Var(\hat{\theta}_{FE}) = \frac{1}{\sum w_i}

Random Effects Model Allows the true effect to vary across studies: θ^i=θi+ϵi,θiN(μ,τ2)\hat{\theta}_i = \theta_i + \epsilon_i, \quad \theta_i \sim N(\mu, \tau^2)

where μ\mu is the average effect and τ2\tau^2 is between-study variance. The random effects pooled estimate is: θ^RE=wiθ^iwi,wi=1σi2+τ2\hat{\theta}_{RE} = \frac{\sum w_i^* \hat{\theta}_i}{\sum w_i^*}, \quad w_i^* = \frac{1}{\sigma_i^2 + \tau^2}

Worked Example: Minimum Wage Meta-Analysis

Consider three minimum wage studies:

  • Study A: Elasticity = -0.05, SE = 0.03

  • Study B: Elasticity = -0.15, SE = 0.06

  • Study C: Elasticity = -0.08, SE = 0.04

Fixed Effects Calculation: wA=1/0.032=1111,wB=1/0.062=278,wC=1/0.042=625w_A = 1/0.03^2 = 1111, \quad w_B = 1/0.06^2 = 278, \quad w_C = 1/0.04^2 = 625 θ^FE=1111(0.05)+278(0.15)+625(0.08)1111+278+625=147.72014=0.073\hat{\theta}_{FE} = \frac{1111(-0.05) + 278(-0.15) + 625(-0.08)}{1111 + 278 + 625} = \frac{-147.7}{2014} = -0.073 SEFE=1/2014=0.022SE_{FE} = 1/\sqrt{2014} = 0.022

The pooled estimate is -0.073 with SE 0.022.

If between-study heterogeneity is substantial (τ2>0\tau^2 > 0), random effects weights would give relatively more weight to smaller studies and yield a wider confidence interval.

Quantifying Heterogeneity

The I2I^2 statistic measures what fraction of observed variance is due to true heterogeneity rather than sampling error:

I2=QdfQ×100%I^2 = \frac{Q - df}{Q} \times 100\%

where QQ is Cochran's Q statistic: Q=wi(θ^iθ^FE)2Q = \sum w_i (\hat{\theta}_i - \hat{\theta}_{FE})^2

Interpretation guidelines (Higgins et al. 2003):

  • I2<25%I^2 < 25\%: Low heterogeneity

  • I2=2575%I^2 = 25-75\%: Moderate heterogeneity

  • I2>75%I^2 > 75\%: High heterogeneity

When heterogeneity is high, a single pooled estimate may be less meaningful than understanding what drives variation.

Box: Critiques of Random Effects Meta-Analysis

Random effects models are standard practice, but face serious critiques:

1. The normality assumption is arbitrary

Random effects assumes true effects follow θiN(μ,τ2)\theta_i \sim N(\mu, \tau^2). But why should nature produce normally distributed treatment effects? The assumption is convenient, not justified. With few studies, this distributional choice strongly affects results.

2. "Average effect" may not exist

If studies target different populations, use different interventions, or measure different outcomes, what does the "average" effect mean? Pooling apples and oranges produces fruit salad, not insight.

3. Weights can be perverse

Random effects gives more weight to smaller, noisier studies than fixed effects does. If small studies are systematically different (e.g., due to publication bias), this amplifies bias.

4. Few-studies problem

Estimating τ2\tau^2 requires multiple studies. With fewer than ~10 studies, τ2\tau^2 is poorly estimated, and random effects confidence intervals can be badly miscalibrated.

When to use anyway: Despite these critiques, random effects remains useful when (a) you believe effects genuinely vary, (b) you have enough studies to estimate heterogeneity, and (c) you're honest that "the average effect" is a modeling construct.

The Shrinkage Formula

Random effects estimation shrinks each study toward the pooled mean. The amount of shrinkage depends on study precision and total heterogeneity:

θ^ishrunk=λiθ^i+(1λi)θ^RE\hat{\theta}_i^{shrunk} = \lambda_i \hat{\theta}_i + (1 - \lambda_i) \hat{\theta}_{RE}

where the shrinkage factor is:

λi=τ2τ2+σi2\lambda_i = \frac{\tau^2}{\tau^2 + \sigma_i^2}

Interpretation:

  • When σi2\sigma_i^2 is large (imprecise study): λi\lambda_i is small → heavy shrinkage toward mean

  • When τ2\tau^2 is large (high heterogeneity): λi\lambda_i is large → less shrinkage, trust individual studies

  • When τ2=0\tau^2 = 0 (no heterogeneity): λi=0\lambda_i = 0 → complete shrinkage to fixed effects estimate

Example: Study A has θ^A=0.30\hat{\theta}_A = 0.30 with σA2=0.04\sigma_A^2 = 0.04. The pooled estimate is θ^RE=0.15\hat{\theta}_{RE} = 0.15 with τ2=0.02\tau^2 = 0.02.

λA=0.020.02+0.04=0.33\lambda_A = \frac{0.02}{0.02 + 0.04} = 0.33

θ^Ashrunk=0.33(0.30)+0.67(0.15)=0.20\hat{\theta}_A^{shrunk} = 0.33(0.30) + 0.67(0.15) = 0.20

The shrunk estimate (0.20) is pulled toward the mean, reflecting skepticism about extreme values from noisy studies.

Figure 24.1: Forest Plot for Meta-Analysis. Each row shows a study's point estimate (circle) with 95% confidence interval (horizontal line). Circle size reflects inverse-variance weight. The pooled fixed-effects estimate (diamond) synthesizes across studies. The dashed vertical line marks zero (null effect).

Bayesian Hierarchical Models

A powerful alternative to frequentist random effects is Bayesian hierarchical modeling, which treats study-specific effects as draws from a common distribution:

θiN(μ,τ2)\theta_i \sim N(\mu, \tau^2) θ^iθiN(θi,σi2)\hat{\theta}_i | \theta_i \sim N(\theta_i, \sigma_i^2)

This framework offers several advantages:

  • Shrinkage: Study estimates are pulled toward the overall mean, with more shrinkage for less precise studies

  • Uncertainty quantification: Posterior distributions capture uncertainty about both individual study effects and the overall mean

  • Prediction: Can predict effects in new contexts, directly addressing external validity

Worked Example: Bayesian Meta-Analysis of Microcredit RCTs (Meager 2019)

Meager (2019) applied Bayesian hierarchical models to seven randomized microcredit evaluations---the same studies discussed in Chapter 10. Her analysis yielded several important insights:

Decomposing heterogeneity: The seven studies appeared to show substantial variation in effects. But how much was true heterogeneity versus sampling error? Meager's hierarchical model estimated that roughly 60% of observed cross-study variation was sampling error. The underlying effects were more similar than they appeared.

Pooling information: The Bayesian framework pooled information across studies while allowing for genuine heterogeneity. Studies with smaller samples were shrunk more toward the overall mean, appropriately discounting imprecise estimates.

External validity: The model generated a predictive distribution for effects in a hypothetical new site. This directly addresses the policy question: "What effect should we expect if we expand microcredit to a new country?" The answer incorporated both the estimated average effect and the estimated variability across contexts.

Joint estimation: Rather than analyzing each outcome separately, Meager jointly modeled multiple outcomes (household expenditure, business profits, consumption), capturing correlations across outcomes and studies.

The analysis concluded that microcredit expansions have modest positive effects on average, with household expenditure increasing by about 5%, but that effects are unlikely to be transformative in any setting. This nuanced conclusion---neither "microcredit works" nor "microcredit doesn't work"---exemplifies what good meta-analysis can deliver.

Meta-Regression

Meta-regression extends meta-analysis to model heterogeneity:

θ^i=β0+β1X1i+...+βpXpi+ui+ϵi\hat{\theta}_i = \beta_0 + \beta_1 X_{1i} + ... + \beta_p X_{pi} + u_i + \epsilon_i

where XkiX_{ki} are study-level characteristics (sample size, method, time period, country) and uiu_i captures residual between-study variance.

Example: Why Do Minimum Wage Studies Differ?

Doucouliagos and Stanley (2009) use meta-regression to explain heterogeneity in minimum wage studies:

Study characteristics as moderators:

  • Teen vs. total employment

  • US vs. international

  • Before vs. after Card-Krueger

  • Publication status (journal vs. working paper)

  • Methodological approach (time series vs. panel vs. quasi-experimental)

They find that after controlling for publication bias and methodology, the evidence points to small negative effects---not the large effects or zero effects found in subsets of the literature.


24.2 Publication Bias

The Problem

Publication bias occurs when studies with statistically significant or theoretically expected results are more likely to be published than studies with null or unexpected results. This systematically distorts the published literature.

Definition 24.2: Publication Bias Systematic tendency for the selection of studies into the published literature to depend on results, leading to a non-representative sample of all studies conducted.

Mechanisms:

  1. File drawer problem: Researchers don't submit null results

  2. Editorial preferences: Journals prefer significant findings

  3. Reader interest: Significant findings get more citations

  4. P-hacking: Researchers select specifications that achieve significance

Detection Methods

Funnel Plot Plot effect size against precision (1/SE). Under no publication bias, the plot should be symmetric around the true effect. Asymmetry suggests publication bias.

Figure 24.2: Funnel Plots for Publication Bias Detection. Left panel shows symmetric distribution expected under no publication bias—studies scatter evenly around the true effect regardless of precision. Right panel shows asymmetry indicating publication bias—smaller studies (lower precision) cluster on one side, suggesting selective publication of significant results.

Egger's Test Regresses standardized effect size on precision: θ^iσi=β0+β11σi+ϵi\frac{\hat{\theta}_i}{\sigma_i} = \beta_0 + \beta_1 \frac{1}{\sigma_i} + \epsilon_i

If β00\beta_0 \neq 0, asymmetry is present, suggesting publication bias.

FAT-PET-PEESE Stanley and Doucouliagos's approach:

FAT (Funnel Asymmetry Test): θ^i=β0+β1σi+ϵi\hat{\theta}_i = \beta_0 + \beta_1 \sigma_i + \epsilon_i

If β10\beta_1 \neq 0, publication bias is present. The estimate β0\beta_0 provides a bias-corrected effect under the assumption that bias is proportional to standard error.

PEESE (Precision-Effect Estimate with Standard Error): θ^i=β0+β1σi2+ϵi\hat{\theta}_i = \beta_0 + \beta_1 \sigma_i^2 + \epsilon_i

Uses squared standard error when FAT rejects no effect.

Worked Example: Testing for Publication Bias

Suppose we have 20 minimum wage studies. Regressing estimates on standard errors yields: θ^i=0.02+0.8×σi\hat{\theta}_i = -0.02 + 0.8 \times \sigma_i

The coefficient 0.8 on SE is significant (p = 0.02), indicating publication bias. The intercept -0.02 represents the bias-corrected effect---small negative but close to zero, rather than the -0.10 average across published studies.

Correction Approaches

Trim-and-Fill Imputes "missing" studies to make the funnel plot symmetric, then re-estimates the pooled effect.

Selection Models Explicitly model the selection process. Hedges and Vevea (1996) and Andrews and Kasy (2019) develop selection models for meta-analysis.

p-Curve Analysis Examines the distribution of significant p-values. If effects are real, p-values should be right-skewed (clustered near zero). If results are due to p-hacking, they cluster just below 0.05.

Limitations

All publication bias methods rely on untestable assumptions:

  • Funnel plots assume bias operates through imprecision

  • Selection models assume specific functional forms for selection

  • Meta-regression assumes publication bias is the only source of asymmetry

Discipline about these methods requires honesty about uncertainty.


24.3 Systematic Review

What Makes a Review "Systematic"?

A systematic review differs from a narrative literature review in being:

  • Comprehensive: Attempts to find all relevant studies

  • Transparent: Documents search and selection criteria

  • Reproducible: Another researcher could replicate the search

  • Structured: Uses predetermined protocols

The PRISMA Framework

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) provides reporting standards:

Key elements:

  1. Protocol: Pre-specified search strategy, eligibility criteria, analysis plan

  2. Search: Multiple databases, grey literature, reference lists

  3. Selection: Documented screening process with inclusion/exclusion counts

  4. Assessment: Quality/risk of bias evaluation for included studies

  5. Synthesis: Quantitative (meta-analysis) or qualitative summary

Figure 24.3: PRISMA Flow Diagram. Systematic reviews document the screening process from initial database search through final inclusion. This transparency allows readers to assess comprehensiveness and replicability. The four stages (Identification, Screening, Eligibility, Inclusion) show progressive filtering with documented exclusion reasons.

Quality Assessment

Not all studies deserve equal weight. Systematic reviews assess study quality/risk of bias:

For RCTs (Cochrane Risk of Bias tool):

  • Random sequence generation

  • Allocation concealment

  • Blinding

  • Incomplete outcome data

  • Selective reporting

For observational studies (Newcastle-Ottawa Scale):

  • Selection of study groups

  • Comparability

  • Outcome ascertainment

For quasi-experiments in economics:

  • Identification strategy credibility

  • Data quality

  • Pre-trends/placebo tests

  • Robustness across specifications

Example: Systematic Review of Microfinance Impacts

Duvendack et al. (2011) conducted an influential systematic review of microfinance impacts:

Search strategy:

  • 15 databases searched

  • Grey literature from development organizations

  • Reference lists of included studies

  • Authors contacted for unpublished work

Eligibility criteria:

  • Quantitative impact studies

  • Microfinance (credit) as intervention

  • Outcome: poverty, income, consumption, or welfare

  • Control or comparison group

Results:

  • 35,000 records screened

  • 170 full texts assessed

  • 58 studies included

  • Quality generally poor (few controlled studies, selection bias common)

Conclusion: Evidence on microfinance impact was surprisingly weak before the recent RCTs, despite enthusiastic claims in policy circles.


24.4 Replication and Robustness

The Replication Crisis

The Open Science Collaboration (2015) attempted to replicate 100 psychology studies. Only 36% replicated successfully. Similar concerns arose in economics:

  • Camerer et al. (2016): 11 of 18 experimental economics studies replicated

  • Chang and Li (2022): 29% of economics papers fully reproducible

  • Brodeur et al. (2016): Bunching of test statistics just above significance thresholds

This motivated new approaches to robustness and transparency.

Robustness Tools for Primary Research

Several tools help individual researchers make their analysis more robust:

  • Specification curves display how results vary across all defensible analytical choices

  • Multiverse analysis extends this to data processing choices

  • Pre-registration commits to an analysis plan before seeing results

These tools improve primary research quality, which in turn improves evidence synthesis. See Chapter 25 (Section 25.2) for detailed treatment of specification curves, multiverse analysis, and pre-registration in the context of research practice.

Pre-Registration and Publication Bias

Pre-registration commits researchers to an analysis plan before seeing results:

Standard pre-registration:

  • Research question

  • Hypotheses

  • Sample and data

  • Variables and measures

  • Analysis plan (primary specifications)

Registered reports:

  • Peer review before data collection

  • Publication commitment regardless of results

  • Eliminates publication bias at source

Pre-Registration in Economics

AEA RCT Registry (2013-present) provides registration for experiments. OSF and EGAP offer broader registration services.

Benefits:

  • Distinguishes confirmatory from exploratory analysis

  • Reduces p-hacking and HARKing (Hypothesizing After Results are Known)

  • Improves reproducibility

Limitations and debates:

  • Economics is often observational (can't pre-register before data exist)

  • Exploration is valuable and shouldn't be discouraged

  • Pre-registration doesn't eliminate all researcher degrees of freedom

Balanced approach:

  • Pre-register primary specifications where feasible

  • Clearly distinguish confirmatory from exploratory

  • Report specification curves for robustness

  • Focus on effect sizes and confidence intervals, not just significance


Practical Guidance

When to Conduct Meta-Analysis

Situation
Appropriate?
Notes

Multiple studies on same question

Yes

Core application

Studies use same/comparable outcome measures

Yes

Effect sizes comparable

High heterogeneity across studies

Maybe

May be more useful to explain heterogeneity than pool

Studies have fundamental design differences

Caution

May be combining apples and oranges

Publication bias severe

Caution

Pooling biased estimates yields biased pool

Common Pitfalls

Pitfall 1: Garbage In, Garbage Out Meta-analysis cannot correct for problems in underlying studies. Pooling biased studies yields a more precise but still biased estimate.

How to avoid: Assess study quality. Consider sensitivity analyses excluding low-quality studies.

Pitfall 2: Comparing Incomparable Estimates Studies may estimate different parameters (LATE vs. ATE, short-run vs. long-run). Pooling them conflates different quantities.

How to avoid: Carefully define what each study estimates. Use meta-regression to model differences.

Pitfall 3: Ignoring Heterogeneity When I2I^2 is high, a single pooled estimate may be misleading.

How to avoid: Report and explain heterogeneity. Consider whether pooled estimate is meaningful.

Pitfall 4: Over-Correcting for Publication Bias Publication bias corrections rely on strong assumptions. Aggressive correction can introduce new biases.

How to avoid: Report multiple methods. Treat corrections as sensitivity analyses.

Implementation Checklist

For meta-analysis:

For robustness analysis:


Qualitative Bridge

Qualitative Synthesis Methods

While this chapter focuses on quantitative synthesis, qualitative systematic reviews also exist:

Qualitative evidence synthesis:

  • Thematic synthesis

  • Meta-ethnography

  • Framework synthesis

These methods systematically combine findings from qualitative studies, looking for common themes, contradictions, and explanatory patterns.

Mixed-Methods Synthesis

Increasingly, systematic reviews combine quantitative and qualitative evidence:

EPPI-Centre approach:

  1. Quantitative studies → meta-analysis of effects

  2. Qualitative studies → synthesis of implementation, mechanisms

  3. Integration → what works, for whom, and why

This mirrors the triangulation discussed in Chapter 23, applied to synthesis rather than primary research.


Integration Note

Connections to Other Methods

Method
Relationship
See Chapter

Triangulation

Meta-analysis as formal triangulation method

Ch. 23

Heterogeneity analysis

Meta-regression parallels HTE analysis

Ch. 20

Bayesian methods

Bayesian hierarchical models for pooling

Ch. 3

Machine learning

ML for study selection, coding

Ch. 21

From Single Studies to Cumulative Knowledge

Evidence synthesis represents the culmination of empirical research. Individual studies provide pieces; synthesis assembles the puzzle. But synthesis is only as good as the underlying studies, emphasizing why careful research practice (Chapter 25) matters throughout the research process.


Summary

Key takeaways:

  1. Meta-analysis provides formal methods for combining estimates across studies, with random effects models accounting for heterogeneity and meta-regression explaining variation.

  2. Publication bias systematically distorts the literature. Detection methods (funnel plots, Egger's test, FAT-PET-PEESE) and corrections exist but rely on strong assumptions.

  3. Specification curves and multiverse analysis reveal researcher degrees of freedom and test robustness across analytical choices. Pre-registration commits researchers to analysis plans before seeing results.

Returning to the opening question: Combining evidence across studies requires more than simple averaging. Meta-analysis provides precision-weighted pooling. Publication bias assessment reveals what may be missing. Systematic review ensures comprehensive coverage. And robustness tools show how conclusions depend on analytical choices. Used together, these methods allow us to draw more reliable inferences than any single study---or any casual literature review---can provide.


Further Reading

Essential

  • Borenstein, M., L. Hedges, J. Higgins, and H. Rothstein (2009). "Introduction to Meta-Analysis." Wiley.

  • Stanley, T.D. and H. Doucouliagos (2012). "Meta-Regression Analysis in Economics and Business." Routledge.

For Deeper Understanding

  • Higgins, J. and S. Green, eds. (2011). "Cochrane Handbook for Systematic Reviews of Interventions." Cochrane Collaboration.

  • Andrews, I. and M. Kasy (2019). "Identification of and Correction for Publication Bias." American Economic Review.

Advanced/Specialized

  • Meager, R. (2019). "Understanding the Average Impact of Microcredit Expansions." AEJ: Applied. [Bayesian hierarchical approach]

  • Simonsohn, U., J. Simmons, and L. Nelson (2020). "Specification Curve Analysis." Nature Human Behaviour.

Applications

  • Doucouliagos, H. and T.D. Stanley (2009). "Publication Selection Bias in Minimum-Wage Research?" British Journal of Industrial Relations.

  • Meager, R. (2019). "Understanding the Average Impact of Microcredit Expansions: A Bayesian Hierarchical Analysis of Seven Randomized Experiments." AEJ: Applied. Exemplary Bayesian hierarchical meta-analysis demonstrating joint estimation, heterogeneity decomposition, and external validity prediction.

  • Brodeur, A., M. Le, M. Sangnier, and Y. Zylberberg (2016). "Star Wars: The Empirics Strike Back." AEJ: Applied.


Exercises

Conceptual

  1. Explain why random effects meta-analysis gives relatively more weight to smaller studies compared to fixed effects. When would this be desirable, and when might it be problematic?

  2. A meta-analysis finds a pooled effect of 0.3 with a narrow confidence interval, but I2=85%I^2 = 85\%. How should you interpret this result?

Applied

  1. You are conducting a meta-analysis of studies on the effect of class size on student achievement. List five study-level characteristics you would include in a meta-regression to explain heterogeneity. For each, explain what pattern you would expect and why.

  2. Create a specification curve for a simple analysis. Using any publicly available dataset, identify at least 4 binary analytical choices (e.g., include/exclude outliers, log vs. level, with/without controls). Run all 16 combinations and plot the results. What do you conclude about robustness?

Discussion

  1. Pre-registration has been controversial in economics. What are the strongest arguments for and against requiring pre-registration for observational studies using administrative data?


Appendix 24A: Meta-Analysis Formulas

Fixed Effects Variance

Var(θ^FE)=1i=1kwi=1i=1k1/σi2Var(\hat{\theta}_{FE}) = \frac{1}{\sum_{i=1}^k w_i} = \frac{1}{\sum_{i=1}^k 1/\sigma_i^2}

Estimating Between-Study Variance (τ2\tau^2)

DerSimonian-Laird estimator: τ^2=Q(k1)c\hat{\tau}^2 = \frac{Q - (k-1)}{c}

where: c=wiwi2wic = \sum w_i - \frac{\sum w_i^2}{\sum w_i}

Random Effects Variance

Var(θ^RE)=1i=1kwi=1i=1k1/(σi2+τ2)Var(\hat{\theta}_{RE}) = \frac{1}{\sum_{i=1}^k w_i^*} = \frac{1}{\sum_{i=1}^k 1/(\sigma_i^2 + \tau^2)}

Egger's Test Statistic

Under null of no asymmetry: t=β^0SE(β^0)t = \frac{\hat{\beta}_0}{SE(\hat{\beta}_0)}

follows a t-distribution with k2k-2 degrees of freedom.

Last updated