Chapter 25: Research Practice

Opening Question

Beyond knowing the right methods, what makes the difference between research that contributes to knowledge and research that misleads or is forgotten?

Chapter Overview

The preceding chapters have covered methods: how to identify causal effects, describe patterns, combine evidence, and handle uncertainty. But methods are tools. Using them well requires attention to research practice---the organization, execution, communication, and ethics of empirical work.

This chapter addresses what separates good research from technically correct but ultimately unhelpful work. Good research practice is reproducible, transparent, and honest about uncertainty. It communicates clearly, both to specialists and broader audiences. And it considers the ethical dimensions of working with data about human beings.

These issues have taken on new urgency as replication crises in multiple fields have revealed how easily smart researchers using valid methods can produce misleading results. The problem is not usually fraud. It's the accumulation of small choices---which specifications to report, how to frame results, what to emphasize---that systematically biases what we learn.

What you will learn:

How to organize research projects for reproducibility and collaboration
When and how to pre-register analyses
How to write and present empirical research effectively
How to present uncertainty honestly, avoiding the pitfalls of null hypothesis significance testing

Prerequisites: General familiarity with empirical methods from earlier chapters

Historical Context: From Heroic Research to Reproducible Science

For much of the 20th century, empirical social science followed what we might call the "heroic" model. A researcher would develop a question, collect or access data, analyze it using their judgment about appropriate methods, and publish findings. Replication was rare. Data sharing was optional. The researcher's expertise and reputation served as the primary guarantee of quality.

This model came under sustained challenge beginning in the 1990s. Dewald, Thursby, and Anderson (1986) found that only 7% of economics papers could be replicated with the authors' data. The "credibility revolution" in economics (Angrist and Pischke 2010) emphasized research design over researcher authority. And the broader "replication crisis"---sparked by spectacular failures to replicate in psychology (Open Science Collaboration 2015), medicine (Ioannidis 2005), and eventually economics---forced a reckoning with how research is actually conducted.

The response has included new infrastructure (data repositories, pre-registration platforms), new incentives (journals requiring data availability, badges for open practices), and new norms (distinguishing exploratory from confirmatory analysis). Economics has been slower to adopt some of these changes than psychology, but movement is visible. The AEA's data and code availability policy (2019), the growth of pre-analysis plans for experiments, and increased attention to robustness and specification curves all reflect evolving standards.

This chapter distills emerging best practices while acknowledging ongoing debates about what's appropriate for different types of research.

Box: What Do We Know About Peer Review?
Peer review is the gatekeeping mechanism for scientific knowledge, yet we have surprisingly little evidence about how well it works. Li (2017) provides one of the most rigorous studies, examining NIH grant review using a clever identification strategy.
Her key findings: Reviewers are both more informed and more biased when evaluating research related to their own work. They can better assess quality in their area of expertise, but they also favor proposals similar to their own research. The net effect? Expertise dominates bias---reviewers' specialized knowledge improves selection more than their conflicts of interest harm it.
The identification exploits variation in whether experts are permanent committee members (who evaluate many related proposals) versus temporary members (who serve occasionally). This variation in reviewer composition is plausibly unrelated to proposal quality, allowing causal identification.
For research practice, this suggests: Seek informed reviewers despite potential bias. And for your own work: Understand that review is imperfect but not arbitrary. Rejection doesn't necessarily mean your work is bad, and acceptance doesn't guarantee it's good.

25.1 Project Workflow

Organizing for Reproducibility

A reproducible research project is one where another researcher (or your future self) can understand what was done and verify the results. This requires organization from the start, not cleanup at the end.

Principle 25.1: Reproducibility by Design Build reproducibility into project workflow from day one. It is far easier to maintain clean organization than to impose it on a messy project.

Project Structure

A well-organized empirical project follows a consistent structure that separates raw data, code, and outputs. The key principles:

Separation of raw and processed data: Raw data is never modified. All cleaning is done in code that can be re-run.
Numbered scripts: Code runs in order, making dependencies clear.
Clear input/output: Each script has defined inputs and outputs.
Self-documenting: README files explain what's what.

See Chapter 26 (Programming Companion: Project Management) for detailed folder structure templates, dependency management with Make/targets, and containerization for reproducibility.

Version Control

Version control (typically Git) tracks changes over time:

Benefits:

Complete history of what changed when
Easy to revert mistakes
Collaboration without overwriting
Branching for experiments

Basic workflow:

git status              # What's changed?
git add file.R          # Stage changes
git commit -m "message" # Record changes
git push               # Share with collaborators

Even for solo projects, version control provides invaluable safety and history.

Documentation

Documentation serves multiple audiences:

For yourself:

Analysis notes explaining decisions
Code comments for complex logic
README files for project navigation

For collaborators:

Data dictionaries/codebooks
Dependency documentation
Installation instructions

For replicators:

Complete instructions to reproduce results
Software versions and environment specification
Known issues and limitations

Data Management

Principle 25.2: Raw Data Immutability Never modify raw data files. All transformations should be done in code, creating new files for processed data.

Data management checklist:

Original data preserved unchanged
Data cleaning documented in code
Variable transformations logged
Missing data handling specified
Sample restrictions documented
Codebook with variable definitions

Collaboration

Research is increasingly collaborative. Effective collaboration requires:

Communication:

Regular check-ins
Shared documentation
Clear division of labor

Technical infrastructure:

Shared repositories (GitHub, GitLab)
Cloud storage for large data
Common computing environment

Attribution and credit:

Clear authorship agreements early
Documented contributions
Acknowledgment of all contributors

25.2 Transparency and Pre-Registration

The Case for Transparency

Transparency means others can see what you did and verify your claims. It's fundamental to science but historically neglected in social science.

Dimensions of transparency:

Data availability: Can others access the data?
Code availability: Is the analysis code public?
Material availability: Are instruments, protocols, supplementary materials accessible?
Analysis transparency: Is it clear what choices were made?

Most major economics journals now require data and code availability:

AEA policy (2019):

Data and code must be deposited
Must be sufficient to reproduce results
Exceptions for proprietary/confidential data (but code still required)

Practical considerations:

Use persistent identifiers (DOI) for data deposits
Include dependencies and environment specification
Test that replication package actually works
For confidential data, provide as much as possible (summary statistics, simulated data)

Pre-Registration

Pre-registration commits researchers to an analysis plan before seeing results:

Definition 25.1: Pre-Registration A time-stamped, publicly available research plan documenting hypotheses, data, and analysis methods before results are known.

What to pre-register:

Primary research questions and hypotheses
Data source and sample definition
Variable construction and measurement
Primary specifications (estimation method, controls)
Treatment of outliers and missing data
Multiple testing adjustments

What need not be pre-registered:

Exploratory analyses (but label them as such)
Robustness checks
Secondary specifications

When Pre-Registration Makes Sense

Pre-registration is most valuable when:

Situation

Value of Pre-Registration

Prospective RCT

High - can plan before data collection

Survey with primary outcome

High - commit before seeing responses

Secondary analysis of existing data

Moderate - can pre-register before accessing

Observational study with new data collection

Moderate - plan before analysis

Reanalysis of publicly available data

Lower - others can check your work directly

Exploratory analysis

Lower - exploration is the point

Registered Reports

Registered reports take pre-registration further:

Stage 1 review: Reviewers evaluate research design before data collection
In-principle acceptance: If design is sound, paper will be published regardless of results
Stage 2 review: After data collection, verify pre-registered plan was followed

This eliminates publication bias at source. Results are irrelevant to publication decision.

Criticisms and Limitations

Pre-registration is debated in economics:

Arguments against:

Much economics is observational---can't pre-register before data exist
Over-emphasis on confirmatory analysis discourages valuable exploration
Reviewers can still reject at Stage 2
Administrative burden may not be worthwhile

Responses:

Pre-registration is for confirmatory claims, not all analysis
Exploratory work remains valuable but should be labeled
Burden decreases with practice
Specific to research context---not one-size-fits-all

Balanced Approach

A pragmatic approach:

For experiments: Pre-register primary hypotheses and analysis
For observational work: Pre-register when possible (before accessing data)
Always: Distinguish confirmatory from exploratory analysis
Always: Report what you did (specification curves help)
Accept: Some research is inherently exploratory, and that's fine

Specification Curves and Multiverse Analysis

Pre-registration constrains ex ante choices. Specification curves reveal ex post how conclusions depend on analytical choices.

Definition 25.2: Specification Curve A visualization showing how estimated effects vary across all defensible combinations of analytical choices (variable definitions, sample restrictions, controls, estimation methods).

Construction:

Identify all reasonable analytical choices (e.g., which controls, which sample, which functional form)
Run all possible combinations
Plot estimates sorted by effect size
Show which choices produce which estimates

Worked Example: Specification Curve for Returns to Education
Analytical choices:
Sample: Men only, women only, both
Measure: Years of schooling, highest degree
Controls: None, demographics, family background, ability proxy
Method: OLS, IV (compulsory schooling), IV (college proximity)
This yields 3 × 2 × 4 × 3 = 72 specifications.
The specification curve plots all 72 estimates, showing:
Range of estimates (e.g., 5% to 14% returns)
Which choices drive variation (IV estimates tend higher; controlling for ability lowers OLS)
Robustness of main conclusions (returns are positive across all specifications)

Interpretation:

If estimates cluster tightly, findings are robust to analytical choices
If estimates vary wildly, honest reporting requires acknowledging this fragility
The pattern of variation can be informative: which choices matter most?

Multiverse analysis extends this logic to the full "multiverse" of possible analyses, including data processing choices:

Missing data handling (listwise deletion, imputation, bounds)
Outlier treatment (winsorize, trim, include)
Variable construction (alternative measures, different aggregations)
Sample restrictions (age range, time period, geography)

The full multiverse can contain thousands of specifications. The goal is not to run all of them but to understand how conclusions depend on choices.

When to use specification curves:

When analytical choices are genuinely debatable
When you want to demonstrate robustness (or honestly reveal fragility)
When different specifications have been used in the literature
When reviewers might question your main specification

Limitations:

Computationally intensive for many choices
Not all specifications are equally credible---some choices may be clearly wrong
May give false confidence if all specifications share the same flaw
Cannot substitute for getting the identification right

25.3 Writing and Communication

Structure of Empirical Papers

The standard economics paper follows a predictable structure:

Introduction (~2-4 pages)
- Research question
- Why it matters
- What you do
- What you find
- Contribution
Background/Literature (~2-3 pages)
- Relevant prior work
- Where you fit
Data (~2-4 pages)
- Sources
- Construction
- Summary statistics
Empirical Strategy (~3-5 pages)
- Identification strategy
- Estimating equations
- Key assumptions
Results (~5-10 pages)
- Main findings
- Robustness
- Heterogeneity
Discussion/Conclusion (~2-3 pages)
- Interpretation
- Limitations
- Implications

Writing for Clarity

Principle 25.3: The Reader's Time is Valuable Write for a busy reader who will skim before deciding whether to read carefully. Make your contribution clear quickly.

Practical guidance:

Front-load key information:

First sentence should hint at the question
First paragraph should convey the main finding
Readers should understand your contribution without reading the whole paper

Use structure:

Clear section headings
Topic sentences for paragraphs
Transitions between sections

Be precise:

Define terms
Specify what you mean (which population, which parameter)
Avoid vague qualifiers ("significant relationship")

Be concise:

Cut unnecessary words
Every paragraph should serve a purpose
One point per paragraph

Tables and Figures

Tables and figures often convey results more effectively than prose.

Table principles:

Informative titles that describe content
Clear column headers
Standard errors in parentheses (or brackets for confidence intervals)
Note significance levels and sample sizes
Don't include too many columns---split complex tables
Round appropriately (3-4 significant digits usually sufficient)

Example: Well-Formatted Regression Table
Table 3: Returns to Education

(1)

(2)

(3)

OLS

Years of schooling

0.103

0.089

0.112

(0.008)

(0.024)

(0.019)

Controls

Yes

First-stage F

12.4

18.7

24,531

Notes: Standard errors in parentheses, clustered by state. * p<0.10, ** p<0.05, *** p<0.01. Controls include age, age squared, race, and region fixed effects.

Figure principles:

Clear, informative titles
Labeled axes with units
Legends when needed
Source notes
Not too cluttered
Consider colorblind-friendly palettes

Figure 25.1: Good vs. Poor Figure Design. Both panels show identical data on GDP per capita across regions. The poor design (top) uses garish colors, a cluttered bar chart, and removes helpful gridlines. The good design (bottom) uses a line chart appropriate for time series, accessible colors, clean labels, and minimal visual clutter. Good visualization makes patterns immediately apparent.

Writing for Different Audiences

Research often needs to reach multiple audiences:

Academic specialists:

Full technical detail
Extensive robustness
Positioning in literature

Policy audiences:

Lead with implications
Minimize jargon
Focus on magnitudes and uncertainty
Explicit about what results do/don't show

General audiences:

Plain language
Concrete examples
Clear visualizations
Acknowledge limitations without burying the finding

25.4 Presenting Uncertainty Honestly

The Problem with P-Values

The null hypothesis significance testing (NHST) framework has dominated empirical research but produces systematic problems:

Issues with p-values:

Dichotomization: Treats p = 0.049 differently from p = 0.051
Misinterpretation: P-values don't measure probability the null is true
Publication bias: Incentivizes p-hacking to cross thresholds
Effect size neglect: Statistical significance ≠ practical importance

Definition 25.2: What a P-Value Actually Is The probability of observing data as extreme or more extreme than what was observed, if the null hypothesis were true and the study were repeated many times. It is not the probability that the null is true.

Beyond Significance Stars

Moving beyond binary significance requires:

1. Report effect sizes with uncertainty

Report estimates with confidence intervals, not just significance
Interpret magnitude, not just sign
Consider practical significance, not just statistical

2. Use confidence intervals

95% CI conveys more than stars
Readers can assess whether effects of various sizes are compatible with data
Multiple significance levels implicit in CI

3. Consider Bayesian approaches

Posterior probabilities directly address "probability effect is real"
Prior specification makes assumptions explicit
Credible intervals have intuitive interpretation

Practical Guidance on Presenting Uncertainty

Principle 25.4: Honest Uncertainty Report what you learned, including what you didn't learn. Overstating precision harms credibility and misleads users of research.

Do:

Report confidence intervals for key estimates
Discuss sensitivity of results to specification choices
Note sample size and power considerations
Distinguish statistically significant from economically meaningful
Acknowledge what assumptions are required

Don't:

Make binary claims based on p-value thresholds
Hide imprecision behind stars
Dismiss insignificant results as "no effect"
Over-interpret point estimates when intervals are wide
Claim certainty you don't have

Communicating to Non-Specialists

Policy audiences and the public need accessible communication of uncertainty:

Strategies:

Use natural frequencies ("1 in 20") rather than percentages or decimals
Visualize uncertainty (error bars, ranges, distributions)
Use plain language ("we can't rule out effects anywhere from -5% to +10%")
Provide context for magnitudes ("similar to the effect of X")
Be explicit about confidence level ("we're fairly confident that...")

Worked Example: Communicating Minimum Wage Results
Technical: "We estimate an employment elasticity of -0.073 (SE = 0.022, p < 0.01)."
Policy audience: "Our results suggest that a 10% minimum wage increase would reduce employment by about 0.7%, plus or minus about 0.4 percentage points."
General audience: "We find small negative effects on employment. A typical minimum wage increase might reduce jobs by less than 1%---meaningful but modest. Some studies find even smaller effects or no effect at all, so there's genuine uncertainty about the exact impact."

The Role of Prior Knowledge

Pure frequentist inference ignores prior information, but researchers and readers always have priors. Bayesian approaches make this explicit:

$P(\theta | data) \propto P(data | \theta) \times P(\theta)$

Practical implications:

Extraordinary claims require extraordinary evidence
Accumulated prior evidence matters for interpretation
A single study rarely should change beliefs dramatically
Meta-analytic thinking applies informally even without formal meta-analysis

25.5 Ethics in Empirical Research

Dimensions of Research Ethics

Research ethics extends beyond IRB compliance:

1. Human subjects protection

Informed consent
Privacy and confidentiality
Minimizing harm
Special protections for vulnerable populations

2. Data ethics

Responsible use of administrative data
Privacy in the age of big data
Algorithmic fairness when research informs policy

3. Professional integrity

Honest reporting
Appropriate attribution
Conflicts of interest

4. Social responsibility

Considering who benefits from research
Engaging affected communities
Thinking about misuse of findings

Common Ethical Challenges

Re-identification risk: Even "anonymized" data can sometimes be re-identified. Consider:

What harm could come from re-identification?
What safeguards are appropriate?
When is data too sensitive to share?

Research on vulnerable populations: Development economics often studies the global poor. Consider:

Power dynamics between researchers and subjects
Benefit sharing with studied communities
Avoiding "extractive" research

Dual use: Research findings can be used for purposes researchers didn't intend. Consider:

Who might use your findings?
Could findings be misused?
Do you have responsibility for downstream use?

Box: Concrete Ethics Cases in Empirical Research
Case 1: Algorithmic Fairness in Criminal Justice
Predictive algorithms used in bail and sentencing decisions (like COMPAS) have been shown to exhibit racial disparities. Researchers face tensions:
Algorithms may reduce overall detention rates (benefit)
But may systematically disadvantage Black defendants (harm)
"Fairness" has multiple incompatible definitions (equal false positive rates? equal accuracy? calibration?)
Researcher responsibility: If your work informs algorithmic tools, assess disparate impact across protected groups. Report fairness metrics alongside accuracy.
Case 2: Targeting in Development Programs
ML-based targeting can identify who benefits most from interventions. But:
Optimization for efficiency may conflict with equity
Targeting on predicted outcomes can exclude those most in need if they have lower predicted gains
Communities may perceive targeting as unfair even if statistically justified
Researcher responsibility: Be explicit about the welfare function being optimized. Consider who is left out and why.
Case 3: Re-identification of "Anonymous" Data
Researchers have demonstrated that individuals can be re-identified from "anonymous" datasets:
Sweeney showed 87% of Americans are uniquely identified by zip + birthdate + gender
Genetic data linked to public genealogy databases identified thousands
Location data from phones can identify individuals from patterns
Researcher responsibility: Assume determined adversaries. Use formal privacy protections (differential privacy) for sensitive data. Don't release data that could enable harm even if IRB approved.
Case 4: Research in Authoritarian Contexts
Field experiments and surveys in non-democracies raise special issues:
Enumerators may face retaliation for certain questions
Governments may demand data access
Findings could be used to target dissidents
Researcher responsibility: Consider whether the research can be done safely and ethically in context. Have data destruction protocols. Limit what data is collected.
The common thread: Ethical research requires imagination about downstream consequences, not just compliance with formal rules.

Ethical Research Practice

Principle 25.5: Ethics Throughout the Research Process Ethical considerations should inform all stages of research, not just IRB review at the beginning.

Planning stage:

Is the question worth asking?
Who benefits from the research?
What are the risks to participants?

Data collection:

Informed consent
Privacy protections
Fair compensation

Analysis:

Honest reporting
No selective presentation
Appropriate caveats

Communication:

Accurate representation
Accessible to affected communities
Consideration of how findings might be used

25.6 AI-Assisted Research Workflow

Large language models (LLMs) and AI coding assistants have become integral to empirical research workflows. Using them effectively requires understanding their capabilities, limitations, and ethical implications.

AI Tools in the Research Process

Code generation and debugging:

LLMs can write boilerplate code, data cleaning scripts, and visualization functions
Particularly useful for syntax you don't use daily (e.g., complex regex, SQL queries, LaTeX tables)
Effective for debugging—paste error messages and code for suggested fixes

Literature review and synthesis:

AI can summarize papers, identify themes across literature, and suggest relevant citations
Useful for rapid orientation in unfamiliar fields
Cannot replace careful reading of key papers

Writing assistance:

Drafting, editing, and restructuring prose
Generating first drafts of methods sections from analysis code
Translation between technical and non-technical registers

Critical Limitations

Warning: AI Does Not Understand Your Research
LLMs are pattern-matching systems trained on text. They do not:
Understand causal identification strategies
Know whether your instrument is valid
Verify that code produces correct results
Check whether claims are supported by evidence
AI assistance complements but never replaces domain expertise and careful verification.

Common failure modes:

Failure

Example

Mitigation

Plausible but wrong code

Correct syntax, wrong algorithm

Always test on known cases

Hallucinated citations

Cites papers that don't exist

Verify every reference

Confident nonsense

Authoritative-sounding but factually wrong

Cross-check key claims

Subtle statistical errors

Misapplies methods in edge cases

Review statistical logic carefully

Training data cutoff

Doesn't know recent methods/papers

Supplement with current sources

Ethical Considerations

Transparency and disclosure:

Journals increasingly require disclosure of AI use
Document which parts of your workflow used AI assistance
Maintain human accountability for all claims and code

Authorship and credit:

AI is a tool, not an author—humans bear responsibility
Acknowledge AI assistance in methods or acknowledgments
Do not misrepresent AI-generated text as entirely your own work

Data privacy:

Do not paste sensitive or confidential data into cloud-based AI tools
IRB protocols may restrict AI use with human subjects data
Consider local/on-premise AI tools for sensitive projects

Best Practices

Verification is mandatory:

# AI-generated code workflow:
1. Describe task to AI
2. Review generated code line-by-line
3. Test on simple cases with known answers
4. Test edge cases (missing data, unusual values)
5. Document that code was AI-assisted and verified

Maintain intellectual ownership:

Use AI to accelerate, not replace, your thinking
Understand every line of code you commit
Be able to explain and defend every methodological choice

Version control for AI interactions:

Save prompts and responses for reproducibility
Document which AI model and version was used
Note when AI suggestions were modified

Integration with Traditional Workflow

Task

Traditional Approach

AI-Augmented Approach

Literature search

Database queries, citation chains

AI synthesis + targeted deep reading

Code writing

Write from scratch, adapt examples

AI draft + verification + refinement

Debugging

Stack Overflow, documentation

AI diagnosis + manual verification

Writing

Outline → draft → revise

AI draft → heavy revision → human voice

Peer review

Read and comment

AI summary + focused human critique

The key principle: AI handles the mechanical while humans provide judgment, creativity, and accountability. The division of labor should leave all substantive decisions—identification strategy, interpretation, claims—in human hands.

Practical Guidance

When to Do What

Stage

Key Practices

Project start

Set up reproducible structure, version control

Before data access

Pre-register (where appropriate)

During analysis

Document choices, maintain code quality

Writing

Clear structure, honest uncertainty

Submission

Complete replication package

Publication

Archive data and code

Common Pitfalls

Pitfall 1: Cleanup Later Planning to clean up messy code/organization after the project is done. It never happens.
How to avoid: Build reproducibility in from the start. It's easier to maintain good practices than to impose them retroactively.

Pitfall 2: Over-Selling Overstating certainty or importance of findings to increase publication chances or media attention.
How to avoid: Report results honestly, including uncertainty. Your reputation is built over a career, not a single paper.

Pitfall 3: Under-Documenting Assuming you'll remember why you made analysis decisions.
How to avoid: Document decisions in real time. Use meaningful variable names and code comments. Write README files.

Pitfall 4: Significance Chasing Running specifications until one is significant, then reporting only that.
How to avoid: Pre-register primary specifications. Report specification curves. Distinguish confirmatory from exploratory.

Implementation Checklist

Project setup:

Clear directory structure
Version control initialized
README started
Collaboration agreements documented

Analysis:

Raw data preserved unchanged
All transformations in code
Key decisions documented
Pre-registration completed (where appropriate)

Writing:

Clear structure
Honest uncertainty
Accessible to target audience
Limitations acknowledged

Archiving:

Replication package complete
Code tested by someone else
Data deposited (or explained why not)
Persistent identifiers obtained

Summary

Key takeaways:

Reproducibility requires organization from project start---clear directory structure, version control, documentation, and immutable raw data.
Pre-registration and transparency help distinguish confirmatory from exploratory findings, though their value depends on research context.
Good communication front-loads key findings, uses tables and figures effectively, and honestly presents uncertainty without hiding behind p-values.

Returning to the opening question: The difference between research that contributes to knowledge and research that misleads lies not primarily in technical sophistication but in research practice. Organized, transparent, reproducible work that honestly communicates uncertainty is more valuable than brilliant analysis buried in messy projects with selective reporting. These practices benefit not just science but researchers themselves---you will thank your past self for that well-organized project and that carefully documented decision.

Exercises

Conceptual

A colleague argues that pre-registration is unnecessary for observational research because the data already exist. How would you respond? Under what circumstances might pre-registration still be valuable?
Explain why a 95% confidence interval conveys more information than a significance star. Give an example where knowing the interval would change interpretation.

Applied

Find a published empirical economics paper. Evaluate its replicability based on available information: Is data available? Is code available? Could you reproduce the results? What's missing?
Take one of your own past analyses (or a homework assignment). Reorganize it following the project structure principles in this chapter. Document what you had to add or clarify.

Discussion

Some argue that requiring data and code sharing imposes unfair burdens on researchers who collected expensive original data, giving free-riders access to years of work. Others argue openness is essential to science. Where do you come down, and how would you design policies to balance these concerns?

Appendix 25A: Resources

Pre-Registration Platforms

AEA RCT Registry (experiments in economics)
OSF Registries (Open Science Framework)
EGAP Registry (governance and politics)
As Predicted (simple, rapid registration)

Data Repositories

ICPSR (social science data)
Harvard Dataverse
OpenICPSA
Zenodo (general purpose)
Journal-specific repositories

Software and Tools

Git/GitHub (version control)
Docker (computational reproducibility)
R Markdown/Jupyter (literate programming)
Make/Snakemake (pipeline management)

Style Guides

Gentzkow and Shapiro code guide
STATA coding guidelines (SSC)
Google R style guide
Journal-specific guidelines

PreviousChapter 24: Evidence Synthesis NextChapter 26: Programming Companion—Project Management

Last updated 1 month ago

hashtagOpening Question

hashtagChapter Overview

hashtagHistorical Context: From Heroic Research to Reproducible Science

hashtag25.1 Project Workflow

hashtagOrganizing for Reproducibility

hashtagProject Structure

hashtagVersion Control

hashtagDocumentation

hashtagData Management

hashtagCollaboration

hashtag25.2 Transparency and Pre-Registration

hashtagThe Case for Transparency

hashtagData and Code Sharing

hashtagPre-Registration

hashtagWhen Pre-Registration Makes Sense

hashtagRegistered Reports

hashtagCriticisms and Limitations

hashtagBalanced Approach

hashtagSpecification Curves and Multiverse Analysis

hashtag25.3 Writing and Communication

hashtagStructure of Empirical Papers

hashtagWriting for Clarity

hashtagTables and Figures

hashtagWriting for Different Audiences

hashtag25.4 Presenting Uncertainty Honestly

hashtagThe Problem with P-Values

hashtagBeyond Significance Stars

hashtagPractical Guidance on Presenting Uncertainty

hashtagCommunicating to Non-Specialists

hashtagThe Role of Prior Knowledge

hashtag25.5 Ethics in Empirical Research

hashtagDimensions of Research Ethics

hashtagCommon Ethical Challenges

hashtagEthical Research Practice

hashtag25.6 AI-Assisted Research Workflow

hashtagAI Tools in the Research Process

hashtagCritical Limitations

hashtagEthical Considerations

hashtagBest Practices

hashtagIntegration with Traditional Workflow

hashtagPractical Guidance

hashtagWhen to Do What

hashtagCommon Pitfalls

hashtagImplementation Checklist

hashtagSummary

hashtagFurther Reading

hashtagEssential

hashtagFor Deeper Understanding

hashtagAdvanced/Specialized

hashtagApplications

hashtagExercises

hashtagConceptual

hashtagApplied

hashtagDiscussion

hashtagAppendix 25A: Resources

hashtagPre-Registration Platforms

hashtagData Repositories

hashtagSoftware and Tools

hashtagStyle Guides