Chapter 18: Programming Companion—Causal Inference

Opening Question

How do we implement the identification strategies from Part III in practical code?

Chapter Overview

This chapter provides practical implementations of the causal inference methods covered in Chapters 9-17. We cover experimental analysis, matching and weighting, instrumental variables, difference-in-differences, regression discontinuity, synthetic control, and time series causal methods.

The emphasis is on commonly used packages and workflows. For each method, we show basic implementation, diagnostics, and visualization in both R and Python where mature packages exist.

What you will learn:

How to analyze experimental data with proper randomization inference
How to implement matching and propensity score methods
How to estimate IV, DiD, RD, and synthetic control designs
How to create publication-quality tables and diagnostic plots

Prerequisites: Chapters 9-17 (conceptual foundations), Chapter 4 (programming basics)

18.1 Experimental Analysis

Power Analysis

R with pwr:

library(pwr)

# Two-sample t-test power analysis
# What sample size for 80% power to detect d=0.3?
pwr.t.test(d = 0.3, sig.level = 0.05, power = 0.80, type = "two.sample")
# n = 176 per group

# What power do we have with n=100 per group?
pwr.t.test(n = 100, d = 0.3, sig.level = 0.05, type = "two.sample")
# power = 0.56

# MDE with given sample
pwr.t.test(n = 500, sig.level = 0.05, power = 0.80, type = "two.sample")
# d = 0.177 (minimum detectable effect)

Python with statsmodels:

from statsmodels.stats.power import TTestIndPower

power_analysis = TTestIndPower()

# Sample size for 80% power
n = power_analysis.solve_power(effect_size=0.3, power=0.8,
                               alpha=0.05, ratio=1.0)
print(f"Required n per group: {n:.0f}")

# Power with given sample
power = power_analysis.solve_power(effect_size=0.3, nobs1=100,
                                   alpha=0.05, ratio=1.0)
print(f"Power with n=100: {power:.2f}")

# Plot power curves
import matplotlib.pyplot as plt
import numpy as np

effect_sizes = np.arange(0.1, 0.8, 0.05)
sample_sizes = [50, 100, 200, 500]

fig, ax = plt.subplots(figsize=(8, 5))
for n in sample_sizes:
    powers = [power_analysis.solve_power(effect_size=d, nobs1=n, alpha=0.05)
              for d in effect_sizes]
    ax.plot(effect_sizes, powers, label=f'n={n}')

ax.axhline(y=0.8, linestyle='--', color='gray', label='80% power')
ax.set_xlabel('Effect Size (Cohen\'s d)')
ax.set_ylabel('Statistical Power')
ax.legend()
ax.set_title('Power Curves')
plt.tight_layout()

Balance Testing

library(tidyverse)
library(tableone)

# Create balance table
vars <- c("age", "female", "education", "income", "married")
balance_table <- CreateTableOne(vars = vars, strata = "treatment",
                                 data = experiment_data, test = TRUE)
print(balance_table, smd = TRUE)  # Include standardized mean differences

# Manual balance check with regression
balance_models <- map(vars, ~{
  lm(reformulate("treatment", response = .x), data = experiment_data)
}) %>% set_names(vars)

# F-test for joint significance
joint_test <- lm(treatment ~ age + female + education + income + married,
                 data = experiment_data)
summary(joint_test)

Python:

import pandas as pd
from scipy import stats

def balance_table(df, treatment_var, covariates):
    """Create balance table comparing treatment and control."""
    results = []
    for var in covariates:
        treat = df[df[treatment_var] == 1][var]
        control = df[df[treatment_var] == 0][var]

        # Means
        mean_t, mean_c = treat.mean(), control.mean()

        # Standardized difference
        pooled_sd = np.sqrt((treat.var() + control.var()) / 2)
        smd = (mean_t - mean_c) / pooled_sd

        # t-test p-value
        _, p_value = stats.ttest_ind(treat.dropna(), control.dropna())

        results.append({
            'Variable': var,
            'Mean (Treatment)': mean_t,
            'Mean (Control)': mean_c,
            'SMD': smd,
            'p-value': p_value
        })

    return pd.DataFrame(results)

balance_df = balance_table(experiment_data, 'treatment',
                           ['age', 'female', 'education', 'income', 'married'])
print(balance_df.to_string(index=False))

Treatment Effect Estimation

library(fixest)
library(estimatr)

# Simple difference in means
t.test(outcome ~ treatment, data = experiment_data)

# Regression with robust SEs
model1 <- lm_robust(outcome ~ treatment, data = experiment_data)

# With covariates (Lin estimator)
model2 <- lm_robust(outcome ~ treatment * (age + female + education),
                    data = experiment_data)

# Clustered randomization
model3 <- lm_robust(outcome ~ treatment,
                    clusters = cluster_id,
                    data = experiment_data)

# With fixest
model4 <- feols(outcome ~ treatment | strata_fe,
                data = experiment_data,
                cluster = ~cluster_id)

# Display results
library(modelsummary)
modelsummary(list("Simple" = model1, "Covariates" = model2,
                  "Clustered" = model3),
             stars = TRUE)

Python:

import statsmodels.formula.api as smf
from scipy import stats

# Simple difference
treat = experiment_data[experiment_data['treatment'] == 1]['outcome']
control = experiment_data[experiment_data['treatment'] == 0]['outcome']
ate = treat.mean() - control.mean()
se = np.sqrt(treat.var()/len(treat) + control.var()/len(control))
print(f"ATE: {ate:.3f} (SE: {se:.3f})")

# Regression
model = smf.ols('outcome ~ treatment', data=experiment_data).fit(cov_type='HC2')
print(model.summary())

# With covariates
model_cov = smf.ols('outcome ~ treatment + age + female + education',
                    data=experiment_data).fit(cov_type='HC2')

# Clustered
model_cluster = smf.ols('outcome ~ treatment', data=experiment_data).fit(
    cov_type='cluster', cov_kwds={'groups': experiment_data['cluster_id']})

Randomization Inference

R with ri2:

library(ri2)

# Define experiment
declaration <- declare_ra(N = nrow(experiment_data),
                          m = sum(experiment_data$$treatment))

# Conduct randomization inference
ri_result <- conduct_ri(
  formula = outcome ~ treatment,
  declaration = declaration,
  assignment = "treatment",
  data = experiment_data,
  sims = 1000
)

summary(ri_result)
plot(ri_result)

18.2 Matching and Weighting

Propensity Score Estimation

library(MatchIt)

# Estimate propensity scores
ps_model <- glm(treatment ~ age + female + education + income,
                family = binomial, data = obs_data)
obs_data$$pscore <- predict(ps_model, type = "response")

# Check overlap
ggplot(obs_data, aes(x = pscore, fill = factor(treatment))) +
  geom_density(alpha = 0.5) +
  labs(x = "Propensity Score", fill = "Treatment") +
  theme_minimal()

Python:

from sklearn.linear_model import LogisticRegression

# Estimate propensity scores
X = obs_data[['age', 'female', 'education', 'income']]
y = obs_data['treatment']

ps_model = LogisticRegression(max_iter=1000)
ps_model.fit(X, y)
obs_data['pscore'] = ps_model.predict_proba(X)[:, 1]

# Check overlap
fig, ax = plt.subplots()
for t, color in [(0, 'steelblue'), (1, 'coral')]:
    subset = obs_data[obs_data['treatment'] == t]
    ax.hist(subset['pscore'], bins=30, alpha=0.5, color=color,
            label=f'Treatment={t}', density=True)
ax.set_xlabel('Propensity Score')
ax.legend()
plt.tight_layout()

Matching with MatchIt

library(MatchIt)
library(cobalt)

# Nearest neighbor matching
match_nn <- matchit(treatment ~ age + female + education + income,
                    data = obs_data,
                    method = "nearest",
                    distance = "glm",
                    ratio = 1)

summary(match_nn)

# Check balance
love.plot(match_nn, thresholds = c(m = 0.1))

# Extract matched data
matched_data <- match.data(match_nn)

# Estimate treatment effect on matched sample
model_matched <- lm(outcome ~ treatment + age + female + education + income,
                    data = matched_data,
                    weights = weights)
summary(model_matched)

# Alternative matching methods
# Optimal matching
match_opt <- matchit(treatment ~ age + female + education + income,
                     data = obs_data,
                     method = "optimal")

# Full matching
match_full <- matchit(treatment ~ age + female + education + income,
                      data = obs_data,
                      method = "full")

# CEM (Coarsened Exact Matching)
match_cem <- matchit(treatment ~ age + female + education + income,
                     data = obs_data,
                     method = "cem")

Inverse Probability Weighting

R with WeightIt:

library(WeightIt)
library(cobalt)

# Estimate weights
weights_obj <- weightit(treatment ~ age + female + education + income,
                        data = obs_data,
                        method = "ps",
                        estimand = "ATT")

summary(weights_obj)

# Check balance
bal.tab(weights_obj, thresholds = c(m = 0.1))
love.plot(weights_obj)

# Add weights to data
obs_data$$ipw <- weights_obj$$weights

# Estimate effect
library(survey)
design <- svydesign(ids = ~1, weights = ~ipw, data = obs_data)
model_ipw <- svyglm(outcome ~ treatment, design = design)
summary(model_ipw)

Python:

from causalinference import CausalModel

# Using causalinference package
Y = obs_data['outcome'].values
D = obs_data['treatment'].values
X = obs_data[['age', 'female', 'education', 'income']].values

causal = CausalModel(Y, D, X)
causal.est_propensity_s()  # Estimate propensity scores
causal.est_via_weighting()  # IPW estimate

print(causal.estimates)

IPW for Continuous Treatments

When treatment is continuous (dosage, duration, intensity), weights are based on probability density functions rather than propensity scores. See Chapter 11.4 for the conceptual foundation.

R with WeightIt:

library(WeightIt)
library(cobalt)

# Continuous treatment: hours of training
# WeightIt automatically detects continuous treatment
weights_cont <- weightit(hours_training ~ age + female + education + income,
                         data = obs_data,
                         method = "ps")  # Uses density-based weights

summary(weights_cont)

# Check balance - for continuous treatments, balance is assessed via
# correlations between treatment and covariates
bal.tab(weights_cont)

# Estimate dose-response using weighted regression
library(survey)
design <- svydesign(ids = ~1, weights = weights_cont$$weights, data = obs_data)
model_dr <- svyglm(outcome ~ hours_training + I(hours_training^2),
                   design = design)
summary(model_dr)

# Plot dose-response curve
library(ggplot2)
newdata <- data.frame(hours_training = seq(0, 100, by = 5))
newdata$$pred <- predict(model_dr, newdata = newdata, type = "response")
ggplot(newdata, aes(x = hours_training, y = pred)) +
  geom_line() +
  labs(x = "Hours of Training", y = "Predicted Outcome",
       title = "Dose-Response Curve (IPW)") +
  theme_minimal()

R manual implementation (to understand the mechanics):

# Step 1: Estimate conditional density (denominator)
# Regress treatment on confounders
denom_model <- lm(hours_training ~ age + female + education + income,
                  data = obs_data)
denom_pred <- predict(denom_model)
denom_sd <- sd(residuals(denom_model))

# Step 2: Estimate marginal density (numerator for stabilization)
numer_mean <- mean(obs_data$$hours_training)
numer_sd <- sd(obs_data$$hours_training)

# Step 3: Calculate stabilized weights
# w = f(D) / f(D|X)
obs_data$$weight_cont <- dnorm(obs_data$$hours_training, numer_mean, numer_sd) /
                        dnorm(obs_data$$hours_training, denom_pred, denom_sd)

# Check for extreme weights
summary(obs_data$$weight_cont)

# Truncate extreme weights if needed
obs_data$$weight_cont_trim <- pmin(obs_data$$weight_cont,
                                   quantile(obs_data$$weight_cont, 0.99))

Python:

import numpy as np
from scipy.stats import norm
import statsmodels.api as sm

# Manual implementation
# Step 1: Conditional density (denominator)
X = obs_data[['age', 'female', 'education', 'income']]
X_const = sm.add_constant(X)
denom_model = sm.OLS(obs_data['hours_training'], X_const).fit()
denom_pred = denom_model.predict(X_const)
denom_sd = np.std(denom_model.resid)

# Step 2: Marginal density (numerator)
numer_mean = obs_data['hours_training'].mean()
numer_sd = obs_data['hours_training'].std()

# Step 3: Stabilized weights
obs_data['weight_cont'] = (
    norm.pdf(obs_data['hours_training'], numer_mean, numer_sd) /
    norm.pdf(obs_data['hours_training'], denom_pred, denom_sd)
)

# Weighted regression for dose-response
import statsmodels.formula.api as smf
model_dr = smf.wls('outcome ~ hours_training + I(hours_training**2)',
                   data=obs_data, weights=obs_data['weight_cont']).fit()
print(model_dr.summary())

Diagnostic Note: With continuous treatments, check that weights are not too extreme. Unlike binary IPW where weights blow up near propensity scores of 0 or 1, continuous IPW produces extreme weights when someone's treatment value is very unlikely given their covariates. Examine the weight distribution and consider trimming at the 1st and 99th percentiles.

Doubly Robust Estimation

library(AIPW)

# AIPW estimator
aipw_result <- AIPW$$new(
  Y = obs_data$$outcome,
  A = obs_data$$treatment,
  W = obs_data[, c("age", "female", "education", "income")],
  Q.SL.library = c("SL.glm", "SL.ranger"),
  g.SL.library = c("SL.glm", "SL.ranger"),
  k_split = 5,
  verbose = FALSE
)

aipw_result$$fit()
aipw_result$$summary()

Model Interpretation with marginaleffects

When estimating treatment effects from nonlinear models (logit, probit, Poisson) or models with interactions, regression coefficients are not directly interpretable as causal effects. The marginaleffects package (Arel-Bundock, Greifer & Heiss 2024) provides a unified framework for computing predictions, comparisons, and slopes across 100+ model types—implementing the g-computation approach discussed in Chapter 11.

Core functions:

predictions(): Predicted outcomes at specific covariate values
comparisons(): Differences in predictions (contrasts, risk differences)
slopes(): Marginal effects (derivatives)
avg_*() versions: Average across all observations (AME)

library(marginaleffects)

# Fit a logistic regression for employment
model <- glm(employed ~ treatment + age + female + education,
             family = binomial, data = obs_data)

# Average Marginal Effect of treatment (= ATE on probability scale)
avg_comparisons(model, variables = "treatment")

# Marginal effect at specific values
comparisons(model, variables = "treatment",
            newdata = datagrid(age = c(25, 45, 65), female = 0:1))

# Average marginal effect of continuous variable
avg_slopes(model, variables = "age")

# G-computation for ATE (predict under each treatment, then compare)
# This is equivalent to avg_comparisons() but makes the logic explicit
predictions(model, newdata = datagrid(treatment = 0:1, grid_type = "counterfactual")) |>
  summarize(ATE = mean(estimate[treatment == 1]) - mean(estimate[treatment == 0]))

# With any model type - same interface
library(lme4)
mixed_model <- glmer(outcome ~ treatment + (1 | cluster),
                     family = binomial, data = obs_data)
avg_comparisons(mixed_model, variables = "treatment")

Python:

import marginaleffects as me
import statsmodels.formula.api as smf

# Fit model
model = smf.logit('employed ~ treatment + age + female + education',
                  data=obs_data).fit()

# Average marginal effect of treatment
me.avg_comparisons(model, variables='treatment')

# Slopes (continuous variables)
me.avg_slopes(model, variables='age')

# Predictions at specific values
me.predictions(model, newdata=me.datagrid(model, age=[25, 45, 65]))

Connecting to causal estimands:

Causal Quantity

marginaleffects Function

ATE (binary treatment)

avg_comparisons(model, variables = "treatment")

ATT

avg_comparisons(model, variables = "treatment", newdata = subset(data, treatment == 1))

Dose-response

avg_comparisons(model, variables = list(dose = c(0, 1, 2, 3)))

CATE by subgroup

comparisons(model, variables = "treatment", by = "subgroup")

Marginal effect of continuous X

avg_slopes(model, variables = "X")

Why this matters: When you run glm(y ~ treatment, family = binomial) and report the coefficient, you're reporting a log odds ratio—not the probability change that answers "what is the effect of treatment?" The average marginal effect from avg_comparisons() answers the question you actually care about.

Integration with causal inference packages:

library(marginaleffects)
library(MatchIt)

# After matching
match_obj <- matchit(treatment ~ age + female + education,
                     data = obs_data, method = "nearest")
matched_data <- match.data(match_obj)

# Fit outcome model on matched data
model <- glm(outcome ~ treatment + age + female + education,
             family = binomial, data = matched_data, weights = weights)

# AME on matched sample
avg_comparisons(model, variables = "treatment", wts = "weights")

Practical Box: When to Use marginaleffects
Use marginaleffects when:
Your outcome model is nonlinear (logit, probit, Poisson, etc.)
You have interactions and want effects at specific covariate values
You want to report effects on the natural scale (probabilities, counts)
You need to combine it with matching/weighting
You don't need it when:
Your model is linear with no interactions (coefficients = marginal effects)
You're using packages that already compute ATEs (e.g., AIPW, did)

18.3 Instrumental Variables

Basic 2SLS

R with fixest:

library(fixest)

# Basic IV
iv_model <- feols(log_wage ~ 1 | education ~ distance_college,
                  data = iv_data)
summary(iv_model)

# With controls
iv_model2 <- feols(log_wage ~ experience + experience_sq |
                   education ~ distance_college,
                   data = iv_data)

# Multiple instruments
iv_model3 <- feols(log_wage ~ experience |
                   education ~ distance_college + quarter_birth,
                   data = iv_data)

# Clustered SEs
iv_model4 <- feols(log_wage ~ experience |
                   education ~ distance_college,
                   data = iv_data,
                   cluster = ~state)

# Display first stage
summary(iv_model, stage = 1)

# Table with first stage F-stat
etable(iv_model, iv_model2, fitstat = c("n", "r2", "ivf"))

R with ivreg:

library(ivreg)

iv_result <- ivreg(log_wage ~ education + experience | distance_college + experience,
                   data = iv_data)
summary(iv_result, diagnostics = TRUE)  # Includes weak instrument test

Python:

from linearmodels.iv import IV2SLS

# Basic IV
iv_model = IV2SLS.from_formula(
    'log_wage ~ 1 + experience + [education ~ distance_college]',
    data=iv_data
).fit(cov_type='robust')
print(iv_model.summary)

# First stage
first_stage = IV2SLS.from_formula(
    'education ~ 1 + experience + distance_college',
    data=iv_data
).fit()
print(first_stage.summary)

# First-stage F statistic
print(f"First-stage F: {iv_model.first_stage.diagnostics['f.stat'].stat:.2f}")

Weak Instrument Diagnostics

library(fixest)

# First-stage F statistic (rule of thumb: F > 10)
summary(iv_model, stage = 1)  # Reports F automatically

# Anderson-Rubin test (robust to weak instruments)
library(ivmodel)
iv_ar <- ivmodel(Y = iv_data$$log_wage,
                 D = iv_data$$education,
                 Z = iv_data$$distance_college,
                 X = iv_data[, c("experience")])
AR.test(iv_ar)

# LIML (less biased with weak instruments)
iv_liml <- feols(log_wage ~ experience |
                 education ~ distance_college,
                 data = iv_data,
                 vcov = "iid")  # LIML available via different estimation

18.4 Difference-in-Differences

Basic Two-Period DiD

library(fixest)

# Two-period, two-group DiD
did_basic <- feols(outcome ~ treated * post | unit + time,
                   data = did_data)
summary(did_basic)

# Equivalent specification
did_basic2 <- feols(outcome ~ treat_x_post | unit + time,
                    data = did_data)

Python:

import statsmodels.formula.api as smf
from linearmodels.panel import PanelOLS

# Set index
did_data = did_data.set_index(['unit', 'time'])

# Basic DiD
model = PanelOLS.from_formula(
    'outcome ~ treated:post + EntityEffects + TimeEffects',
    data=did_data
).fit(cov_type='clustered', cluster_entity=True)
print(model.summary)

Event Studies

R with fixest:

library(fixest)

# Create event time variable
did_data <- did_data %>%
  mutate(event_time = time - treatment_time,
         event_time = if_else(is.na(treatment_time), -1000, event_time))

# Event study with fixest (sunab for staggered)
es_model <- feols(outcome ~ sunab(treatment_time, time) | unit + time,
                  data = did_data,
                  cluster = ~unit)

summary(es_model)

# Plot
iplot(es_model,
      xlab = "Periods Relative to Treatment",
      main = "Event Study: Effect on Outcome")

# Manual event study
did_data <- did_data %>%
  mutate(
    event_time_fct = factor(event_time),
    event_time_fct = relevel(event_time_fct, ref = "-1")  # Normalize to t-1
  )

es_manual <- feols(outcome ~ event_time_fct | unit + time,
                   data = filter(did_data, abs(event_time) <= 5),
                   cluster = ~unit)

coefplot(es_manual)

Custom event study plot:

library(broom)
library(ggplot2)

# Extract coefficients
es_coefs <- tidy(es_model, conf.int = TRUE) %>%
  filter(str_detect(term, "event_time")) %>%
  mutate(event_time = parse_number(term))

# Add reference period
es_coefs <- bind_rows(
  es_coefs,
  tibble(event_time = -1, estimate = 0, conf.low = 0, conf.high = 0)
)

# Plot
ggplot(es_coefs, aes(x = event_time, y = estimate)) +
  geom_point() +
  geom_line() +
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_vline(xintercept = -0.5, linetype = "dashed", color = "red") +
  labs(x = "Periods Relative to Treatment",
       y = "Effect",
       title = "Event Study") +
  theme_minimal()

Modern DiD Estimators

Callaway-Sant'Anna with did package:

library(did)

# Callaway-Sant'Anna estimator
cs_result <- att_gt(
  yname = "outcome",
  tname = "time",
  idname = "unit",
  gname = "treatment_time",  # First treatment period (0 if never treated)
  data = did_data,
  control_group = "nevertreated"  # or "notyettreated"
)

summary(cs_result)

# Aggregate to overall ATT
agg_overall <- aggte(cs_result, type = "simple")
summary(agg_overall)

# Event study aggregation
agg_es <- aggte(cs_result, type = "dynamic")
ggdid(agg_es)

# By group
agg_group <- aggte(cs_result, type = "group")
summary(agg_group)

Sun-Abraham with fixest:

library(fixest)

# Sun-Abraham interaction-weighted estimator
sa_model <- feols(outcome ~ sunab(treatment_time, time) | unit + time,
                  data = did_data,
                  cluster = ~unit)

summary(sa_model)
iplot(sa_model)

# Aggregate effects
summary(sa_model, agg = "ATT")  # Overall ATT
summary(sa_model, agg = "cohort")  # By cohort

18.5 Regression Discontinuity

Sharp RD with rdrobust

library(rdrobust)

# Basic RD estimate
rd_result <- rdrobust(y = rd_data$$outcome,
                      x = rd_data$$running_var,
                      c = 0)  # Cutoff
summary(rd_result)

# With covariates
rd_cov <- rdrobust(y = rd_data$$outcome,
                   x = rd_data$$running_var,
                   c = 0,
                   covs = cbind(rd_data$$age, rd_data$$female))

# Different bandwidth selection
rd_mse <- rdrobust(y = rd_data$$outcome,
                   x = rd_data$$running_var,
                   bwselect = "mserd")  # MSE-optimal

rd_cer <- rdrobust(y = rd_data$$outcome,
                   x = rd_data$$running_var,
                   bwselect = "cerrd")  # CER-optimal

# Manual bandwidth
rd_manual <- rdrobust(y = rd_data$$outcome,
                      x = rd_data$$running_var,
                      h = 5)  # Bandwidth = 5

Visualization:

library(rdrobust)

# RD plot
rdplot(y = rd_data$$outcome,
       x = rd_data$$running_var,
       c = 0,
       title = "RD Plot: Effect of Treatment",
       x.label = "Running Variable",
       y.label = "Outcome")

# Custom plot
rd_data %>%
  mutate(bin = cut(running_var, breaks = 50)) %>%
  group_by(bin) %>%
  summarize(mean_y = mean(outcome),
            mean_x = mean(running_var),
            n = n()) %>%
  ggplot(aes(x = mean_x, y = mean_y)) +
  geom_point(aes(size = n), alpha = 0.6) +
  geom_smooth(data = filter(rd_data, running_var < 0),
              aes(x = running_var, y = outcome),
              method = "loess", se = FALSE) +
  geom_smooth(data = filter(rd_data, running_var >= 0),
              aes(x = running_var, y = outcome),
              method = "loess", se = FALSE) +
  geom_vline(xintercept = 0, linetype = "dashed") +
  labs(x = "Running Variable", y = "Outcome") +
  theme_minimal() +
  theme(legend.position = "none")

Python:

from rdrobust import rdrobust, rdplot

# Basic RD
rd_result = rdrobust(y=rd_data['outcome'],
                     x=rd_data['running_var'],
                     c=0)
print(rd_result)

# Plot
rdplot(y=rd_data['outcome'],
       x=rd_data['running_var'],
       c=0)

Manipulation Testing

library(rddensity)

# McCrary density test
density_test <- rddensity(X = rd_data$$running_var, c = 0)
summary(density_test)

# Plot
rdplotdensity(density_test, X = rd_data$$running_var)

Fuzzy RD

library(rdrobust)

# Fuzzy RD (with imperfect compliance)
fuzzy_rd <- rdrobust(y = rd_data$$outcome,
                     x = rd_data$$running_var,
                     fuzzy = rd_data$$actually_treated,
                     c = 0)
summary(fuzzy_rd)

Covariate Balance at Cutoff

# Test balance at cutoff for each covariate
covariates <- c("age", "female", "education", "income")

balance_tests <- map_dfr(covariates, function(cov) {
  rd_bal <- rdrobust(y = rd_data[[cov]],
                     x = rd_data$$running_var,
                     c = 0)
  tibble(
    covariate = cov,
    estimate = rd_bal$$coef[1],
    se = rd_bal$$se[1],
    p_value = rd_bal$$pv[1]
  )
})

print(balance_tests)

18.6 Synthetic Control

Basic Synthetic Control

R with Synth:

library(Synth)

# Prepare data in Synth format
dataprep_out <- dataprep(
  foo = panel_data,
  predictors = c("gdp", "population", "trade"),
  predictors.op = "mean",
  time.predictors.prior = 1980:1989,
  dependent = "outcome",
  unit.variable = "unit_id",
  time.variable = "year",
  treatment.identifier = 1,  # Treated unit ID
  controls.identifier = c(2:20),  # Control unit IDs
  time.optimize.ssr = 1980:1989,
  time.plot = 1980:2000
)

# Run synthetic control
synth_out <- synth(data.prep.obj = dataprep_out)

# Results
synth.tables <- synth.tab(dataprep.res = dataprep_out,
                          synth.res = synth_out)
print(synth.tables)

# Plot
path.plot(synth.res = synth_out,
          dataprep.res = dataprep_out,
          Ylab = "Outcome",
          Xlab = "Year",
          Legend = c("Treated", "Synthetic"),
          Legend.position = "bottomright")

# Gaps plot
gaps.plot(synth.res = synth_out,
          dataprep.res = dataprep_out,
          Ylab = "Gap in Outcome",
          Xlab = "Year")

Inference with Permutation

library(Synth)

# Placebo test: run SCM for each control unit as if treated
placebo_results <- map_dfr(2:20, function(i) {
  dataprep_placebo <- dataprep(
    foo = panel_data,
    predictors = c("gdp", "population", "trade"),
    predictors.op = "mean",
    time.predictors.prior = 1980:1989,
    dependent = "outcome",
    unit.variable = "unit_id",
    time.variable = "year",
    treatment.identifier = i,
    controls.identifier = setdiff(1:20, i),
    time.optimize.ssr = 1980:1989,
    time.plot = 1980:2000
  )

  synth_placebo <- synth(data.prep.obj = dataprep_placebo,
                         method = "BFGS")

  gaps <- dataprep_placebo$$Y1plot - (dataprep_placebo$$Y0plot %*% synth_placebo$$solution.w)

  tibble(
    unit = i,
    year = 1980:2000,
    gap = as.numeric(gaps)
  )
})

# Plot all gaps
ggplot(placebo_results, aes(x = year, y = gap, group = unit)) +
  geom_line(alpha = 0.3, color = "gray50") +
  geom_line(data = filter(placebo_results, unit == 1),  # Add treated unit
            color = "red", size = 1) +
  geom_vline(xintercept = 1990, linetype = "dashed") +
  labs(x = "Year", y = "Gap") +
  theme_minimal()

tidysynth Package

R with tidysynth (more user-friendly):

library(tidysynth)

# Synthetic control with tidysynth
synth_result <- panel_data %>%
  synthetic_control(
    outcome = outcome,
    unit = unit_name,
    time = year,
    i_unit = "Treated Unit",
    i_time = 1990,
    generate_placebos = TRUE
  ) %>%
  generate_predictor(time_window = 1980:1989,
                     gdp = mean(gdp),
                     population = mean(population),
                     trade = mean(trade)) %>%
  generate_weights(optimization_window = 1980:1989) %>%
  generate_control()

# Plot
synth_result %>% plot_trends()
synth_result %>% plot_differences()
synth_result %>% plot_placebos()

# Inference
synth_result %>% grab_significance()

Python Implementation

Python lacks a dedicated synthetic control package as mature as R's Synth, but implementation is straightforward using optimization:

import numpy as np
import pandas as pd
from scipy.optimize import minimize
import matplotlib.pyplot as plt

def synthetic_control(Y, treated_unit, pre_periods, post_periods):
    """
    Basic synthetic control estimator.

    Y: DataFrame with units as columns, time as rows
    treated_unit: name of treated column
    pre_periods: list of pre-treatment period indices
    post_periods: list of post-treatment period indices
    """
    # Separate treated and donor units
    donors = [c for c in Y.columns if c != treated_unit]
    Y_treated_pre = Y.loc[pre_periods, treated_unit].values
    Y_donors_pre = Y.loc[pre_periods, donors].values

    # Objective: minimize pre-treatment MSE
    def objective(w):
        synthetic = Y_donors_pre @ w
        return np.sum((Y_treated_pre - synthetic)**2)

    # Constraints: weights sum to 1, all non-negative
    n_donors = len(donors)
    constraints = {'type': 'eq', 'fun': lambda w: np.sum(w) - 1}
    bounds = [(0, 1) for _ in range(n_donors)]
    w0 = np.ones(n_donors) / n_donors  # Start with equal weights

    # Optimize
    result = minimize(objective, w0, method='SLSQP',
                     bounds=bounds, constraints=constraints)
    weights = result.x

    # Construct synthetic control for all periods
    all_periods = pre_periods + post_periods
    Y_donors_all = Y.loc[all_periods, donors].values
    synthetic = Y_donors_all @ weights

    # Treatment effect = treated - synthetic in post period
    Y_treated_post = Y.loc[post_periods, treated_unit].values
    synthetic_post = Y.loc[post_periods, donors].values @ weights
    effect = Y_treated_post - synthetic_post

    return {
        'weights': dict(zip(donors, weights)),
        'synthetic': pd.Series(synthetic, index=all_periods),
        'effect': effect,
        'pre_rmse': np.sqrt(result.fun / len(pre_periods))
    }

# Example usage
# result = synthetic_control(panel_data, 'California',
#                            pre_periods=range(1970, 1989),
#                            post_periods=range(1989, 2001))

Permutation Inference for Synthetic Control

def sc_permutation_test(Y, treated_unit, pre_periods, post_periods,
                        metric='post_pre_ratio'):
    """
    Permutation test for synthetic control.
    Run SC for each unit as if treated, compare test statistics.
    """
    # Run SC for actual treated unit
    actual = synthetic_control(Y, treated_unit, pre_periods, post_periods)

    if metric == 'post_pre_ratio':
        # Ratio of post-treatment RMSE to pre-treatment RMSE
        post_rmse = np.sqrt(np.mean(actual['effect']**2))
        actual_stat = post_rmse / actual['pre_rmse']
    else:
        actual_stat = np.mean(actual['effect'])

    # Run placebo SC for each donor
    placebo_stats = []
    donors = [c for c in Y.columns if c != treated_unit]

    for donor in donors:
        try:
            placebo = synthetic_control(Y, donor, pre_periods, post_periods)
            if metric == 'post_pre_ratio':
                post_rmse = np.sqrt(np.mean(placebo['effect']**2))
                stat = post_rmse / placebo['pre_rmse'] if placebo['pre_rmse'] > 0 else np.inf
            else:
                stat = np.mean(placebo['effect'])
            placebo_stats.append(stat)
        except:
            continue  # Skip if optimization fails

    # P-value: fraction of placebos with stat >= actual
    p_value = np.mean([s >= actual_stat for s in placebo_stats])

    return {
        'treated_stat': actual_stat,
        'placebo_stats': placebo_stats,
        'p_value': p_value,
        'rank': sum(s < actual_stat for s in placebo_stats) + 1,
        'n_placebos': len(placebo_stats)
    }

# Example: test significance
# perm_result = sc_permutation_test(panel_data, 'California',
#                                   pre_periods, post_periods)
# print(f"P-value: {perm_result['p_value']:.3f}")

Note: For production use, consider the SyntheticControlMethods package on PyPI or implement with cvxpy for more robust optimization. The above code illustrates the core algorithm.

18.7 Time Series Causal Methods

VAR and SVAR

R with vars:

library(vars)

# Prepare data
var_data <- ts(macro_data[, c("gdp_growth", "inflation", "fed_funds")],
               start = c(1960, 1), frequency = 4)

# Estimate reduced-form VAR
var_model <- VAR(var_data, p = 4, type = "const")
summary(var_model)

# Impulse responses (Cholesky identification)
irf_chol <- irf(var_model,
                impulse = "fed_funds",
                response = "gdp_growth",
                n.ahead = 20,
                boot = TRUE,
                ci = 0.95)
plot(irf_chol)

# All impulse responses
irf_all <- irf(var_model, n.ahead = 20)
plot(irf_all)

# Forecast error variance decomposition
fevd_result <- fevd(var_model, n.ahead = 20)
plot(fevd_result)

Local Projections

R manual implementation:

library(tidyverse)
library(fixest)

# Create leads of outcome
max_h <- 20
for (h in 0:max_h) {
  macro_data <- macro_data %>%
    mutate("outcome_lead_{h}" := lead(gdp_growth, h))
}

# Estimate LP for each horizon
lp_results <- map_dfr(0:max_h, function(h) {
  formula <- as.formula(paste0("outcome_lead_", h, " ~ shock + lag(gdp_growth, 1:4)"))
  model <- feols(formula, data = macro_data, vcov = "NW" ~ time)

  tibble(
    horizon = h,
    estimate = coef(model)["shock"],
    se = se(model)["shock"],
    conf_low = estimate - 1.96 * se,
    conf_high = estimate + 1.96 * se
  )
})

# Plot
ggplot(lp_results, aes(x = horizon, y = estimate)) +
  geom_line() +
  geom_ribbon(aes(ymin = conf_low, ymax = conf_high), alpha = 0.2) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(x = "Horizon (quarters)", y = "Response of GDP Growth",
       title = "Local Projection: Response to Shock") +
  theme_minimal()

R with lpirfs package:

library(lpirfs)

# LP-OLS
lp_lin <- lp_lin(
  endog_data = macro_data[, c("gdp_growth", "inflation", "fed_funds")],
  lags_endog_lin = 4,
  shock = macro_data$$shock,
  trend = 0,
  confint = 1.96,
  hor = 20
)

# Plot
plot(lp_lin)

Python:

import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.tsa.api import VAR

# VAR
var_data = macro_data[['gdp_growth', 'inflation', 'fed_funds']].dropna()
model = VAR(var_data)
results = model.fit(maxlags=4, ic='aic')

# IRF
irf = results.irf(20)
irf.plot(impulse='fed_funds', response='gdp_growth')

# Manual LP
def local_projection(data, outcome, shock, controls, max_h=20):
    results = []
    for h in range(max_h + 1):
        y = data[outcome].shift(-h)
        X = pd.concat([data[[shock]], data[controls]], axis=1)
        X = sm.add_constant(X)

        valid = ~(y.isna() | X.isna().any(axis=1))
        model = sm.OLS(y[valid], X[valid]).fit(cov_type='HAC',
                                                cov_kwds={'maxlags': h+1})

        results.append({
            'horizon': h,
            'estimate': model.params[shock],
            'se': model.bse[shock],
            'conf_low': model.conf_int().loc[shock, 0],
            'conf_high': model.conf_int().loc[shock, 1]
        })

    return pd.DataFrame(results)

lp_results = local_projection(macro_data, 'gdp_growth', 'shock',
                              ['lag_gdp1', 'lag_gdp2', 'lag_gdp3', 'lag_gdp4'])

Practical Guidance

Package Recommendations

Method

R Package

Python Package

Power analysis

pwr

statsmodels.stats.power

Balance tables

tableone, cobalt

Custom

Matching

MatchIt

causalinference

IPW

WeightIt

causalinference

Doubly robust

AIPW

econml

Marginal effects

marginaleffects

IV/2SLS

fixest, ivreg

linearmodels

DiD (basic)

fixest

linearmodels

DiD (modern)

did, fixest (sunab)

rdrobust

Synthetic control

Synth, tidysynth

VAR

vars

statsmodels

Local projections

lpirfs

statsmodels (manual)

Common Pitfalls

Pitfall 1: Forgetting to Cluster Standard errors are often wrong without clustering on the appropriate level.
How to avoid: Default to clustered SEs. Cluster at the level of treatment assignment or above.

Box: Python Clustering—A Complete Guide

Python's ecosystem for clustered SEs is more fragmented than R's. Here's how to cluster in each package:

statsmodels (cross-sectional)

import statsmodels.formula.api as smf

# Clustered standard errors
model = smf.ols('y ~ treatment + x', data=df).fit(
    cov_type='cluster',
    cov_kwds={'groups': df['cluster_id']}
)

# Two-way clustering (less common)
from statsmodels.stats.sandwich_covariance import cov_cluster_2groups
# Manual implementation required

linearmodels (panel data)

from linearmodels.panel import PanelOLS

# Set multi-index
df = df.set_index(['entity', 'time'])

# Cluster by entity (most common)
model = PanelOLS.from_formula('y ~ x + EntityEffects', df).fit(
    cov_type='clustered',
    cluster_entity=True
)

# Cluster by time
model = PanelOLS.from_formula('y ~ x + EntityEffects', df).fit(
    cov_type='clustered',
    cluster_time=True
)

# Two-way clustering
model = PanelOLS.from_formula('y ~ x + EntityEffects', df).fit(
    cov_type='clustered',
    cluster_entity=True,
    cluster_time=True
)

pyfixest (new, recommended)

import pyfixest as pf

# Most similar to R's fixest
model = pf.feols('y ~ treatment | entity + time', data=df, vcov={'CRV1': 'entity'})

Warning: Default SEs in most Python packages are not clustered. Always specify clustering explicitly.

Pitfall 2: Wrong Reference Period in Event Studies Not properly normalizing to a pre-treatment period.
How to avoid: Explicitly set reference period to t-1. Verify coefficient at t-1 is exactly zero.

Pitfall 3: Ignoring Weak Instruments Proceeding with IV despite weak first stage.
How to avoid: Always report first-stage F. Use weak-instrument-robust inference if F < 10.

Implementation Checklist

Appropriate standard error clustering
First-stage diagnostics for IV
Pre-trends test for DiD
Manipulation test for RD
Placebo tests for synthetic control
Balance checks for matching
Sensitivity analysis for key assumptions

Summary

Key takeaways:

fixest (R) and linearmodels (Python) provide efficient implementations of regression-based methods with proper standard errors.
Modern DiD estimators (Callaway-Sant'Anna, Sun-Abraham) address problems with staggered adoption; use did or fixest's sunab().
rdrobust provides the standard implementation for RD with optimal bandwidth selection and robust inference.

Returning to the opening question: The packages covered in this chapter implement the identification strategies from Part III. The key is not just running the commands but understanding the assumptions each method requires and conducting appropriate diagnostics. Code without understanding is dangerous; understanding without code is incomplete.

Exercises

Conceptual

Why do different matching methods (nearest neighbor, caliper, optimal) sometimes produce different treatment effect estimates? What does this tell us about the role of implementation choices in causal inference?
Explain why the fixest package reports both "regular" and "sunab" coefficients for event studies. When would you prefer one over the other?
A researcher runs rdrobust and gets a bandwidth of 5 units, but believes domain knowledge suggests a bandwidth of 10 is more appropriate. What are the tradeoffs of using the data-driven versus researcher-selected bandwidth?

Applied

Using experimental data, conduct a full analysis: balance table, ATE estimation with and without controls, and randomization inference. Compare the results.
Implement both matching (MatchIt) and IPW (WeightIt) on observational data. Compare the balance achieved and the estimated treatment effects.
Replicate a published DiD study using the did package. Compare the Callaway-Sant'Anna estimates to standard TWFE.
Conduct an RD analysis: estimate the effect, create an RD plot, test for manipulation, and check covariate balance at the cutoff.

Discussion

R and Python ecosystems offer similar functionality for causal inference, but the packages differ in design philosophy. Compare fixest (R) with linearmodels or pyfixest (Python). What are the practical implications for researchers choosing between them?

PreviousChapter 17: When Point Identification Fails NextChapter 19: Mechanisms—How and Why

Last updated 1 month ago

hashtagOpening Question

hashtagChapter Overview

hashtag18.1 Experimental Analysis

hashtagPower Analysis

hashtagBalance Testing

hashtagTreatment Effect Estimation

hashtagRandomization Inference

hashtag18.2 Matching and Weighting

hashtagPropensity Score Estimation

hashtagMatching with MatchIt

hashtagInverse Probability Weighting

hashtagIPW for Continuous Treatments

hashtagDoubly Robust Estimation

hashtagModel Interpretation with marginaleffects

hashtag18.3 Instrumental Variables

hashtagBasic 2SLS

hashtagWeak Instrument Diagnostics

hashtag18.4 Difference-in-Differences

hashtagBasic Two-Period DiD

hashtagEvent Studies

hashtagModern DiD Estimators

hashtag18.5 Regression Discontinuity

hashtagSharp RD with rdrobust

hashtagManipulation Testing

hashtagFuzzy RD

hashtagCovariate Balance at Cutoff

hashtag18.6 Synthetic Control

hashtagBasic Synthetic Control

hashtagInference with Permutation

hashtagtidysynth Package

hashtagPython Implementation

hashtagPermutation Inference for Synthetic Control

hashtag18.7 Time Series Causal Methods

hashtagVAR and SVAR

hashtagLocal Projections

hashtagPractical Guidance

hashtagPackage Recommendations

hashtagCommon Pitfalls

hashtagImplementation Checklist

hashtagSummary

hashtagFurther Reading

hashtagEssential

hashtagModel Interpretation

hashtagPackage Documentation

hashtagExercises

hashtagConceptual

hashtagApplied

hashtagDiscussion

Opening Question

Chapter Overview

18.1 Experimental Analysis

Power Analysis

Balance Testing

Treatment Effect Estimation

Randomization Inference

18.2 Matching and Weighting

Propensity Score Estimation

Matching with MatchIt

Inverse Probability Weighting

IPW for Continuous Treatments

Doubly Robust Estimation

Model Interpretation with marginaleffects

18.3 Instrumental Variables

Basic 2SLS

Weak Instrument Diagnostics

18.4 Difference-in-Differences

Basic Two-Period DiD

Event Studies

Modern DiD Estimators

18.5 Regression Discontinuity

Sharp RD with rdrobust

Manipulation Testing

Fuzzy RD

Covariate Balance at Cutoff

18.6 Synthetic Control

Basic Synthetic Control

Inference with Permutation

tidysynth Package

Python Implementation

Permutation Inference for Synthetic Control

18.7 Time Series Causal Methods

VAR and SVAR

Local Projections

Practical Guidance

Package Recommendations

Common Pitfalls

Implementation Checklist

Summary

Further Reading

Essential

Model Interpretation

Package Documentation

Exercises

Conceptual

Applied

Discussion