Chapter 2: Data—The Raw Material
Opening Question
How do we obtain reliable information about the social world, and what can go wrong along the way?
Chapter Overview
Empirical research begins with data. But data don't arrive pristine. They are collected by institutions with their own purposes, recorded in systems designed for administration rather than research, measured with instruments that may distort what we seek to understand, and reported by people who may misremember, deceive, or decline to respond. Understanding where data come from, how they're measured, and what can go wrong is essential for drawing valid conclusions.
This chapter surveys social science data sources: the major types, the challenges each presents, and the strategies for assessing and improving data quality. The goal is not to make you a data collection expert, but a sophisticated consumer of data, able to recognize quality issues and understand their implications for analysis.
What you will learn:
The major types of data sources and their relative strengths
How to assess validity and reliability of measures
Common problems: missing data, measurement error, selection
Strategies for linking datasets and constructing analysis samples
How qualitative data complements quantitative sources
Prerequisites: None—this is a foundational chapter
2.1 Types of Data Sources
Administrative Data
Administrative data are collected by governments and organizations for operational purposes, then repurposed for research.
Examples:
Tax records (income, employment)
Social Security earnings histories
Medicare/Medicaid claims (healthcare utilization)
School enrollment and test scores
Criminal justice records
Business registries
Strengths:
Coverage: Often universal or near-universal within the relevant population
No recall bias: Records contemporaneous, not dependent on memory
Large samples: Often the entire population of interest
Long time spans: Administrative systems persist for decades
Low cost: Data already exist; marginal cost of research access is low
Weaknesses:
Limited variables: Only what the system collects; often lacks covariates researchers want
Measurement tied to administrative categories: "Income" means what the tax code says, not what economists mean
Gaming and manipulation: Actors respond strategically (tax avoidance, gaming education metrics)
Access restrictions: Privacy concerns limit access; approval processes are lengthy
Changes over time: Administrative definitions change, creating discontinuities
Example: U.S. Social Security earnings data
Strengths: Universal coverage of formal employment, linked over lifetimes, exact earnings (up to taxable maximum)
Weaknesses: Self-employment earnings may be underreported; earnings above cap are truncated; no information on hours, occupation, or job characteristics
Survey Data
Surveys ask people directly about their characteristics, behaviors, and attitudes.
Types of surveys:
Cross-sectional: One-time snapshot (e.g., General Social Survey)
Repeated cross-section: Same questions, different samples over time (e.g., CPS monthly)
Panel/longitudinal: Same individuals tracked over time (e.g., PSID, NLSY)
Strengths:
Researcher control: Can ask exactly what you want to know
Subjective measures: Attitudes, beliefs, well-being—things administrative data can't capture
Standardization: Designed for research, with documentation
Contextual information: Rich covariates, household structure, history
Weaknesses:
Response bias: People may not report truthfully (social desirability, sensitive questions)
Recall error: Memory is imperfect, especially for dates and amounts
Nonresponse: Not everyone agrees to participate; nonresponse may be selective
Cost: Surveys are expensive; sample sizes limited by budget
Attrition: In panels, people drop out over time
Experimental Data
Experiments generate data through controlled intervention (Chapter 10 covers experimental design in depth).
Strengths:
Internal validity: Randomization ensures treatment-control comparability
Designed for causal inference: Variables and timing chosen for research purpose
Controlled measurement: Can standardize data collection across groups
Weaknesses:
External validity: Experimental populations and settings may not generalize
Hawthorne effects: Being studied may change behavior
Ethical constraints: Some interventions can't be randomized
Cost and logistics: Experiments are expensive and complex
Observational/Found Data
Observational data are collected for purposes other than research or generated naturally by human activity.
Examples:
Historical records (censuses, trade statistics, price lists)
Newspaper archives
Corporate financial data
Geographic information
Strengths:
Covers questions surveys can't: Historical, rare, or sensitive topics
May be the only option: For historical analysis or when surveys are infeasible
Rich context: Documents provide qualitative alongside quantitative information
Weaknesses:
Selection: What survived or was recorded may not be representative
Measurement inconsistency: Categories and definitions change over time
Requires expertise: Understanding context is essential for valid interpretation
Digital Trace Data
Digital systems generate massive amounts of data as byproducts of online activity.
Examples:
Social media posts and interactions
Web browsing and search histories
Mobile phone location data
E-commerce transactions
Sensor and IoT data
Strengths:
Scale: Billions of observations
Granularity: Fine-grained temporal and behavioral detail
Real behavior: What people do, not just what they say
Real-time: Captures dynamics as they unfold
Weaknesses:
Selection: Not everyone uses digital platforms equally
Construct validity: What does a "like" or "share" actually measure?
Platform changes: Data collection depends on platform policies that change
Privacy and ethics: Consent is murky; potential for harm
Noise: Much data is uninformative; signal extraction is hard
Unstructured Data
A growing share of empirical work uses data that doesn't fit neatly into rows and columns: text, images, audio, and video. These require specialized methods to convert into analyzable form.
Text data:
Company earnings calls, congressional speeches, news articles
Social media posts, product reviews, open-ended survey responses
Historical documents, legal filings, patent applications
Methods: Sentiment analysis, topic modeling, word embeddings, LLMs (see Chapter 8)
Image data:
Satellite imagery (nighttime lights as economic activity, deforestation)
Street View images (neighborhood characteristics, property values)
Historical photographs (infrastructure, urban change)
Medical imaging, product images
Methods: Computer vision, convolutional neural networks
Audio and video:
Recorded interviews, speeches, debates
Earnings call audio (tone, emotion beyond transcript)
Surveillance and body camera footage
Methods: Speech recognition, acoustic analysis, video understanding
Strengths:
Rich information not captured in structured data
Often available at scale (millions of documents, global satellite coverage)
Captures nuance and context
Weaknesses:
Requires ML/NLP expertise or specialized tools
Measurement validity is harder to assess (what does a "sentiment score" really mean?)
Computationally intensive
Training data may embed biases
Box: From Unstructured to Structured
The practical workflow involves converting unstructured data to analyzable variables:
Earnings call transcript
→
Sentiment score, topic proportions
→
Predict stock returns
Satellite nighttime lights
→
Pixel intensity by region-year
→
Proxy for GDP in data-poor countries
News articles
→
Named entity counts, event indicators
→
Measure policy uncertainty
Product reviews
→
Star rating, aspect-level sentiment
→
Study consumer preferences
The feature extraction step is where most methodological challenges arise. See Chapter 8 for implementation.
Major Data Repositories and Resources
Empirical researchers benefit from knowing the major repositories of publicly available data. Here are essential resources organized by type:
General-purpose archives:
ICPSR (Inter-university Consortium for Political and Social Research): The largest social science data archive, hosting thousands of datasets with documentation
Harvard Dataverse: Open repository for research data across disciplines
UK Data Service: British equivalent to ICPSR, excellent for UK and comparative data
Harmonized microdata:
IPUMS (Integrated Public Use Microdata Series): Harmonized census and survey microdata from the U.S. and internationally—essential for historical and cross-national research
Luxembourg Income Study (LIS): Harmonized income and wealth microdata from 50+ countries
Comparative Political Data Set: Cross-national data on political and economic indicators
Economic and financial data:
FRED (Federal Reserve Economic Data): Macroeconomic time series, easily accessible via API
World Bank Open Data: Development indicators for all countries
Penn World Table: Internationally comparable GDP, capital, and productivity measures
WRDS (Wharton Research Data Services): Financial and accounting data (institutional subscription required)
Health and demographic data:
NHANES (National Health and Nutrition Examination Survey): Physical exams and health measures
HRS (Health and Retirement Study): Longitudinal data on aging Americans
DHS (Demographic and Health Surveys): Standardized surveys in developing countries
Linked and administrative data:
Census Longitudinal Infrastructure: Links Census surveys over time
LEHD (Longitudinal Employer-Household Dynamics): Employer-employee matched data
Many countries now offer researcher access to linked administrative records through statistical agencies or secure data centers
Getting started: For most topics, begin with ICPSR or IPUMS. Search for existing datasets before collecting new data—someone may have already collected what you need.
2.2 Measurement: Validity and Reliability
The Concept of Measurement
Measurement connects abstract concepts to observable indicators. We want to measure "income," "education," "health," or "political ideology"—but we observe only proxies: tax filings, years of schooling, survey responses about symptoms, or voting behavior.
Definition 2.1 (Measurement): Measurement is the assignment of numbers (or categories) to units according to rules, intended to represent the magnitude of a property.
Validity
Definition 2.2 (Validity): A measure is valid if it captures the concept it's intended to measure.
Types of validity:
Construct validity: Does the measure capture the underlying construct?
Does a standardized test measure "intelligence" or test-taking skill?
Does self-reported happiness measure well-being?
Box: Goodhart's Law and Campbell's Law—When Measures Become Targets
Two related principles warn that measurement validity can deteriorate when measures are used for high-stakes decisions:
Goodhart's Law (1975): "When a measure becomes a target, it ceases to be a good measure."
Campbell's Law (1979): "The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."
Examples:
Schools: When test scores determine funding, teachers "teach to the test"—scores rise but learning may not
Hospitals: When mortality rates affect rankings, hospitals avoid high-risk patients or reclassify deaths
Police: When arrest quotas are targets, officers make more arrests—but not necessarily reduce crime
Academia: When citation counts determine careers, gaming citations becomes rational
The mechanism: A measure initially correlates with the construct because behavior optimizes the construct, not the measure. Once the measure becomes a target, behavior optimizes the measure directly, breaking the correlation.
Implications for research:
Be cautious using high-stakes administrative measures as outcomes
Prefer measures that are difficult to manipulate
Use multiple measures to triangulate
Consider whether your measurement itself might change behavior
Content validity: Does the measure cover all relevant aspects of the construct?
Does a measure of "socioeconomic status" capture wealth, income, education, and occupation?
Criterion validity: Does the measure correlate with other measures it should correlate with?
Predictive: Does it predict future outcomes? (SAT predicting college GPA)
Concurrent: Does it correlate with other measures of the same thing?
Face validity: Does the measure obviously relate to the concept?
Least rigorous; necessary but not sufficient
Reliability
Definition 2.3 (Reliability): A measure is reliable if it produces consistent results under consistent conditions.
Types of reliability:
Test-retest reliability: Does the measure give the same result when repeated?
Ask the same question twice; do answers agree?
Inter-rater reliability: Do different measurers agree?
Two coders categorizing text; do they assign the same codes?
Internal consistency: Do multiple items measuring the same construct correlate?
Cronbach's alpha for survey scales
The Validity-Reliability Relationship
A measure can be:
Reliable but not valid: Always gives the same wrong answer
Valid but not reliable: On average correct but noisy
Neither: Random and wrong
Both: The goal

Figure 2.1 uses target diagrams to illustrate these concepts. Think of each measurement as throwing a dart at a target, where the bullseye represents the true value. Reliability means your throws cluster tightly together. Validity means they cluster around the bullseye. You can be consistently wrong (reliable but invalid—panel 2), on-average right but noisy (valid but unreliable—panel 3), or both wrong and inconsistent (panel 4). Only panel 1 achieves the goal of both.
Reliability is necessary but not sufficient for validity.
Implications for Analysis
Measurement error in outcome variables (Y measured with error):
Increases noise, reduces precision
Doesn't bias coefficients if error is classical (uncorrelated with X)
Measurement error in treatment or explanatory variables (X or D measured with error):
In bivariate regression, classical measurement error biases the coefficient toward zero (attenuation bias)
In multivariate regression, the situation is more complex:
The mismeasured variable's coefficient is still attenuated
But coefficients on other covariates can be biased in any direction (up or down), depending on correlations between regressors
This makes the direction of overall bias often indeterminate
Non-classical error (correlated with true value) can bias in any direction regardless of model complexity
Warning: The simple "attenuation bias" intuition from introductory econometrics applies only to bivariate regression. In applied work with multiple controls, measurement error effects are generally unpredictable without strong assumptions about the covariance structure.
Proxy variables: Using an observable proxy for an unobserved concept:
May introduce bias if proxy imperfectly captures concept
Bias direction depends on the relationship between proxy and true variable
2.3 Missing Data
Types of Missing Data
Definition 2.4 (Missing Data Mechanisms):
MCAR (Missing Completely at Random): Probability of missing is independent of observed and unobserved values
MAR (Missing at Random): Probability of missing depends on observed but not unobserved values
MNAR (Missing Not at Random): Probability of missing depends on the unobserved value itself
MCAR example: Survey responses lost due to random computer failure.
MAR example: Higher-income respondents are less likely to report income, but among people with the same observed education and occupation, missingness is random.
MNAR example: People with the highest incomes refuse to report income because it's high.
Implications
MCAR: Complete-case analysis is unbiased (though inefficient). Simple imputation is valid.
MAR: Complete-case analysis may be biased. Imputation conditional on observed variables is valid.
MNAR: No general solution. Requires modeling the selection process or sensitivity analysis.
Box: MNAR and Partial Identification (Manski Bounds)
When data are MNAR, point identification of population parameters is generally impossible without untestable assumptions about the selection process. However, we can often obtain bounds—a range of values consistent with the data and minimal assumptions.
The key insight (Manski, 1989, 2003): Without assumptions about missing values, we can still bound parameters using worst-case reasoning. If we want to estimate E[Y] and some Y values are missing:
E[Y]lower=E[Y∣observed]⋅P(observed)+Ymin⋅P(missing) E[Y]upper=E[Y∣observed]⋅P(observed)+Ymax⋅P(missing)
Example: A survey asks about income but 20% refuse to answer. Observed mean income is $60,000. If income is bounded between $0 and $500,000:
Lower bound: 0.8×60,000+0.2×0=48,000
Upper bound: 0.8×60,000+0.2×500,000=148,000
These worst-case bounds are often wide but are honest—they reflect our genuine uncertainty under MNAR.
Tightening bounds: Additional assumptions narrow the bounds:
Monotone selection: If we believe high-income people are more likely to refuse, we can rule out lower values for missing observations
Instrumental variables: Variables affecting selection but not outcomes can help (Chapter 12)
Exclusion restrictions: Prior knowledge about the selection process
Connection to sensitivity analysis: Rather than choosing one imputation model, bounds show the range of conclusions consistent with different assumptions. This is more honest than pretending we know the selection process.
See Chapter 17 for full treatment of partial identification and bounds.
Strategies
Complete-case analysis: Analyze only observations with no missing values.
Simple but wasteful; biased if not MCAR
Mean/mode imputation: Replace missing values with sample mean.
Preserves means but distorts variances and correlations
Regression imputation: Predict missing values from observed variables.
Better than mean imputation; still understates uncertainty
Multiple imputation: Generate multiple completed datasets, analyze each, combine results.
Accounts for imputation uncertainty
Requires MAR or explicit selection model
Maximum likelihood: Estimate parameters using all available information.
Efficient under MAR
Requires correctly specified model
Sample Selection
Missing data due to sample selection is particularly problematic:
Attrition in panels: People drop out of longitudinal studies
Survey nonresponse: Some populations hard to reach or refuse
Administrative truncation: Only see people who interact with the system
Selection models (Heckman correction) attempt to address this but require strong assumptions (Chapter 11 discusses selection on observables; Chapter 17 discusses bounds).
2.4 Data Quality Assessment
Red Flags
Impossibilities: Values outside logical range (negative ages, future dates, percentages over 100)
Implausibilities: Values that are logically possible but extremely unlikely (claiming 168 hours worked per week)
Heaping: Excessive clustering at round numbers (ages reported as 30, 40, 50)
Inconsistencies: Internal contradictions (unmarried but married last year; child older than parent)
Outliers: Extreme values that may be errors or genuine but influential observations
Validation Strategies
Cross-source validation: Compare measure to alternative source
Survey-reported income vs. administrative records
Self-reported health vs. medical claims
Predictive validity: Does the measure predict outcomes it should predict?
Educational attainment should predict earnings
Known-group validity: Does the measure differentiate groups it should differentiate?
A depression scale should show higher scores among diagnosed patients
Text-matching and creative proxies: When direct measurement is impossible, creative approaches can reveal information about unobserved quantities.
Example: Measuring Quality of Unfunded Research (Li 2017)
How do you measure the quality of research that was never funded and therefore never conducted? Li (2017) faced this problem when studying NIH peer review. She wanted to know whether grant reviewers could identify high-quality proposals, but couldn't observe what unfunded proposals would have produced.
Her solution: text-matching. She measured the textual similarity between unfunded proposals and subsequently published research. If an unfunded proposal closely matched later publications, that suggested the rejected idea was actually good---someone else pursued it. This proxy allowed her to assess whether "near-miss" applications (just below the funding threshold) contained valuable ideas that reviewers failed to recognize.
This exemplifies a broader principle: when the variable you want is unobservable, creative proxy construction can make the invisible visible. The key is validating that your proxy actually captures what you claim.
Sensitivity checks: How much do results change with different treatments of potential errors?
Documentation
Good data come with good documentation:
Codebook: Variable definitions, coding schemes, valid values
Technical documentation: Sampling procedures, weighting, questionnaire
Data dictionary: File structure, variable names, formats
User guide: How to use the data properly
Without documentation, data are nearly useless—or worse, actively misleading.
2.5 Linking and Constructing Data
Record Linkage
Linking records across datasets multiplies analytic possibilities but introduces new challenges.
Exact matching: Use unique identifiers (SSN, ID numbers)
Best case; rare outside administrative data
Probabilistic matching: Use multiple fields (name, birthdate, address) to find likely matches
Generates false matches and misses true matches
Requires tuning and validation
Linkage error: Both false positives (wrong matches) and false negatives (missed matches) can bias analysis
Panel Construction
Longitudinal data track units over time, but:
Attrition: Units drop out Refreshment: New units added (may not be comparable) Inconsistency: Definitions change over time Gaps: Observations missing for some periods
Sample Definition
Defining the analysis sample requires choices:
Population definition: Who is "in" the study population?
Working-age adults? Registered voters? Firms with >100 employees?
Temporal boundaries: What time period?
Calendar years? Cohorts? Event time?
Exclusions: Who is excluded and why?
Missing key variables? Outliers? Specific subgroups?
Each choice affects interpretation. Document and justify.
2.6 Data Collection: Practical Considerations
Primary vs. Secondary Data
Primary data: Collected by you for your research purpose
Full control over design
Expensive and time-consuming
Secondary data: Collected by others for other purposes
Cheaper and faster
Limited to what others chose to collect
Most academic research uses secondary data; understanding their origins matters for interpreting results.
Data Access
Public-use data: Freely available (often with registration)
Census microdata, IPUMS, many survey datasets
Restricted-access data: Requires application, approval, security protocols
Tax records, health records, confidential business data
May require working in secure facilities
Commercial data: Purchased from vendors
Financial data, consumer behavior, proprietary surveys
Web scraping: Collecting data from websites
Legal and ethical gray areas
Data quality uncertain
Ethics
Data about people raises ethical obligations:
Informed consent: Did people agree to be studied?
Not always possible (administrative data, historical records)
IRB review assesses risks and protections
Privacy: Can individuals be identified?
De-identification may not prevent re-identification
Cell sizes, rare characteristics, linkage attacks
Harm: Could research results harm subjects?
Stigmatization of groups
Policy changes affecting vulnerable populations
Data security: Are data stored and transmitted safely?
2.7 Qualitative Bridge: Documents, Interviews, Observation
Qualitative Data Sources
Quantitative data aren't the only source of knowledge. Qualitative approaches provide:
Documents: Letters, reports, meeting minutes, newspapers, speeches
Rich contextual information
Reveal reasoning and decision processes
Subject to availability and selection
Interviews: Structured or unstructured conversations
Access to subjective experience
Can probe unexpected directions
Interviewer effects, social desirability concerns
Observation: Direct witnessing of behavior
What people do, not just what they say
Hawthorne effects, observer presence changes behavior
Limited to observable phenomena
Complementarity with Quantitative Data
Quantitative strengths: Generalization, precision, testing, comparison
Qualitative strengths: Depth, context, discovery, explanation
Combined approaches:
Use qualitative research to identify variables for quantitative study
Use quantitative patterns to select cases for qualitative investigation
Triangulate findings across methods (Chapter 23)
Example: Understanding Survey Responses
A survey measures "job satisfaction" on a 1-5 scale. But what do respondents mean when they answer?
Quantitative alone: Correlate satisfaction with wages, hours, tenure Qualitative complement: Interview workers about what they consider when rating satisfaction
The qualitative work reveals that "job satisfaction" means different things to different people—some emphasize pay, others autonomy, others relationships. This informs interpretation of quantitative results.
2.8 Running Example: China's Growth Data
The Challenge
Measuring China's post-1978 economic growth involves all the data challenges discussed in this chapter:
Administrative data quality: Chinese official statistics are produced by agencies with incentives to overreport growth. Provincial GDP numbers often don't aggregate to national totals. Researchers debate whether official statistics can be trusted.
Measurement issues:
What price deflators to use when relative prices change dramatically?
How to measure output in sectors transitioning from plan to market?
Service sector notoriously hard to measure
Missing data: Pre-reform data are incomplete; some series were not collected or published; war and political disruption created gaps.
Alternative sources: Researchers have used satellite nighttime lights (proxy for economic activity), electricity consumption, trade partner data (Chinese exports reported by importers), and physical output measures to validate official statistics.
Assessment Strategies
Cross-validation: Compare official GDP with electricity consumption—the relationship was stable in other countries at similar development stages. In China, it's anomalous, suggesting measurement issues.
Partner country data: China's reported exports to Hong Kong can be compared to Hong Kong's reported imports from China. Discrepancies reveal under- or over-invoicing.
Physical output: Agricultural yields, industrial production (tons of steel, cement) provide cross-checks on value measures.
Expert assessment: Economists have developed adjusted series (Young 2003, Holz 2014) that attempt to correct known biases.
Implications
The uncertainty in Chinese data affects all subsequent analysis:
Growth rates could be overstated by 1-2 percentage points annually
TFP growth estimates depend heavily on assumptions
Policy conclusions must acknowledge data limitations
This illustrates a general principle: sophisticated analysis cannot overcome poor data. Understanding data quality is the foundation of credible empirical work.
Practical Guidance
Choosing Data Sources
Large sample, long panel
Administrative data
Subjective measures, attitudes
Survey data
Causal identification
Experimental data
Historical questions
Archival data
Behavioral detail
Digital trace data
Context and meaning
Qualitative data
Common Pitfalls
Pitfall 1: Taking data at face value Assuming data accurately represent what they claim without investigating measurement.
How to avoid: Read documentation; understand data collection; validate against other sources.
Pitfall 2: Ignoring missing data Dropping observations with missing values without considering selection.
How to avoid: Assess missing data mechanism; use appropriate imputation; report sensitivity.
Pitfall 3: Mechanical data cleaning Dropping outliers or "impossible" values without understanding why they occur.
How to avoid: Investigate outliers; they may be errors, but may also be informative; report decisions.
Pitfall 4: Definition drift Using variables whose definitions changed over time without adjustment.
How to avoid: Read documentation carefully; harmonize definitions; test for discontinuities.
Data Quality Checklist
Summary
Key takeaways:
Data come in many forms: Administrative, survey, experimental, observational, and digital trace data each have strengths and weaknesses.
Validity and reliability are distinct: A measure can be consistently wrong (reliable but not valid) or on-average right but noisy (valid but not reliable).
Missing data mechanisms matter: MCAR, MAR, and MNAR require different approaches; ignoring missing data can bias results.
Data quality requires active assessment: Don't trust; verify. Cross-validate, check documentation, investigate anomalies.
Data construction involves choices: Sample definition, variable construction, and handling of problems all affect results. Document and justify.
Qualitative data complement quantitative: Documents, interviews, and observation provide context, meaning, and validation.
Returning to the opening question: Reliable information about the social world comes from understanding where data originate, how they're measured, and what can go wrong. No data are perfect; the goal is to understand limitations and their implications for analysis. Sophisticated methods cannot compensate for poor data, but careful attention to data quality enables credible inference.
Further Reading
Essential
Groves et al. (2009), Survey Methodology - Comprehensive treatment of survey data
Einav and Levin (2014), "Economics in the Age of Big Data" - Administrative and digital data
For Deeper Understanding
Little and Rubin (2019), Statistical Analysis with Missing Data - Missing data methods
Bound, Brown, and Mathiowetz (2001), "Measurement Error in Survey Data" - Handbook chapter
Herzog, Scheuren, and Winkler (2007), Data Quality and Record Linkage Techniques - Linkage methods
Advanced/Specialized
Christen (2012), Data Matching - Probabilistic linkage methods
Salganik (2018), Bit by Bit: Social Research in the Digital Age - Digital data for social science
van der Laan and Rose (2018), Targeted Learning in Data Science - Missing data and causal inference
Applications
Chetty et al. (2016), "The Effects of Exposure to Better Neighborhoods on Children" - Administrative data exemplar
Holz (2014), "The Quality of China's GDP Statistics" - Data quality assessment
Meyer, Mok, and Sullivan (2015), "Household Surveys in Crisis" - Survey data quality trends
Li (2017), "Expertise versus Bias in Evaluation: Evidence from the NIH" - Innovative text-matching approach to measure quality of unobserved counterfactuals
Exercises
Conceptual
Explain the difference between validity and reliability using the example of a bathroom scale. What would it mean for the scale to be (a) reliable but not valid, (b) valid but not reliable?
Why is missing not at random (MNAR) particularly problematic? Give an example from survey data where MNAR is likely.
What are the tradeoffs between administrative data and survey data for measuring income inequality?
Applied
Choose a publicly available dataset (e.g., from ICPSR or a government statistical agency):
Locate and read the documentation
Identify potential measurement issues
Assess the extent of missing data
Write a brief (1 page) data quality assessment
Using a dataset with income measured by both administrative records and survey self-report:
Compare the distributions
Calculate the correlation
Identify patterns in discrepancies (who under/over-reports?)
Discussion
A colleague argues: "Administrative data are always better than survey data because they're not subject to response bias." Critique this claim.
Last updated