Chapter 22: Programming Companion—Beyond Averages

Opening Question

How do we implement methods for heterogeneous treatment effects and machine learning-based causal inference?

Chapter Overview

This chapter provides practical implementations of the methods from Part IV: mechanisms, heterogeneity, and machine learning for causal inference. We focus on three areas: subgroup analysis and visualization, machine learning methods for heterogeneous treatment effects (causal forests, double/debiased ML), and simulation for understanding estimator properties.

These methods have become increasingly important as researchers move beyond average effects to understand who benefits from interventions and why. The packages covered here---grf, EconML, DoubleML---represent the state of the art in causal machine learning.

What you will learn:

How to conduct and visualize subgroup analysis
How to estimate heterogeneous treatment effects with causal forests
How to implement double/debiased machine learning
How to use simulation to understand method properties

Prerequisites: Chapters 19-21 (conceptual foundations), Chapters 4 and 18 (programming basics)

22.1 Subgroup Analysis

Traditional Subgroup Analysis

library(tidyverse)
library(fixest)
library(modelsummary)

# Estimate effects by subgroup
# Method 1: Separate regressions
subgroups <- c("male", "female")
subgroup_models <- map(subgroups, function(g) {
  feols(outcome ~ treatment,
        data = filter(df, gender == g),
        cluster = ~cluster_id)
}) %>% set_names(subgroups)

modelsummary(subgroup_models, stars = TRUE)

# Method 2: Interaction terms (preferred for testing differences)
interaction_model <- feols(outcome ~ treatment * female,
                          data = df,
                          cluster = ~cluster_id)
summary(interaction_model)

# Test: Is the difference significant?
# Coefficient on treatment:female tests H0: effect_female = effect_male

Visualizing subgroup effects:

library(broom)
library(ggplot2)

# Extract coefficients from multiple subgroups
subgroup_vars <- c("female", "education_high", "urban", "young")

subgroup_results <- map_dfr(subgroup_vars, function(var) {
  # Interaction model
  formula <- as.formula(paste0("outcome ~ treatment * ", var))
  model <- feols(formula, data = df, cluster = ~cluster_id)

  # Extract interaction coefficient
  coef_name <- paste0("treatment:", var)
  tibble(
    subgroup = var,
    estimate = coef(model)[coef_name],
    se = se(model)[coef_name],
    conf_low = estimate - 1.96 * se,
    conf_high = estimate + 1.96 * se
  )
})

# Add baseline effect
baseline <- tidy(feols(outcome ~ treatment, data = df, cluster = ~cluster_id),
                 conf.int = TRUE) %>%
  filter(term == "treatment") %>%
  mutate(subgroup = "Overall") %>%
  select(subgroup, estimate, se = std.error, conf_low = conf.low, conf_high = conf.high)

all_results <- bind_rows(baseline, subgroup_results)

# Forest plot
ggplot(all_results, aes(x = estimate, y = reorder(subgroup, estimate))) +
  geom_point(size = 3) +
  geom_errorbarh(aes(xmin = conf_low, xmax = conf_high), height = 0.2) +
  geom_vline(xintercept = 0, linetype = "dashed") +
  labs(x = "Treatment Effect", y = "",
       title = "Heterogeneous Treatment Effects by Subgroup") +
  theme_minimal()

Python:

import pandas as pd
import numpy as np
import statsmodels.formula.api as smf

# Separate regressions
results = {}
for gender in ['male', 'female']:
    subset = df[df['gender'] == gender]
    model = smf.ols('outcome ~ treatment', data=subset).fit(
        cov_type='cluster', cov_kwds={'groups': subset['cluster_id']})
    results[gender] = {
        'estimate': model.params['treatment'],
        'se': model.bse['treatment'],
        'ci_low': model.conf_int().loc['treatment', 0],
        'ci_high': model.conf_int().loc['treatment', 1]
    }

# Interaction model
interaction = smf.ols('outcome ~ treatment * female', data=df).fit(
    cov_type='cluster', cov_kwds={'groups': df['cluster_id']})
print(interaction.summary())

Multiple Testing Corrections

# When testing many subgroups, adjust for multiple comparisons

# Method 1: Bonferroni (conservative)
n_tests <- length(subgroup_vars)
adjusted_alpha <- 0.05 / n_tests

# Method 2: False Discovery Rate (Benjamini-Hochberg)
p_values <- subgroup_results$$p_value
adjusted_p <- p.adjust(p_values, method = "BH")

# Method 3: Pre-specify a small number of subgroups
# Best practice: pre-register which subgroups will be examined

22.2 Causal Forests with grf

Data Preparation for ML Causal Inference

Critical: Preprocessing Categorical Variables
Machine learning tools like grf (R) and EconML (Python) require numeric matrices as input. Categorical variables (strings, factors) must be converted before use.
R: Use model.matrix() to create dummy variables:
# WRONG: This fails or produces garbage if education is a factor
X <- df %>% select(age, education, income) %>% as.matrix()

# RIGHT: Create dummy variables from factors
X <- model.matrix(~ age + education + income - 1, data = df)
# The "-1" removes the intercept column
Python: Use pd.get_dummies() or sklearn's OneHotEncoder:
# WRONG: This fails if 'education' contains strings
X = df[['age', 'education', 'income']].values

# RIGHT: Create dummy variables
X = pd.get_dummies(df[['age', 'education', 'income']],
                   drop_first=True).values

# OR with sklearn
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer

cat_cols = ['education']  # Categorical columns
num_cols = ['age', 'income']  # Numeric columns

preprocessor = ColumnTransformer([
    ('num', 'passthrough', num_cols),
    ('cat', OneHotEncoder(drop='first'), cat_cols)
])
X = preprocessor.fit_transform(df)
Common mistake: Treating numeric-coded categories (1, 2, 3) as continuous. If "education" is coded 1=HS, 2=College, 3=Graduate, the model thinks 3 is "more" than 2. Create dummies instead.

Basic Causal Forest

R with grf:

library(grf)
library(tidyverse)

# Prepare data - properly handle categorical variables
# If all variables are numeric:
X <- df %>% select(age, female, education_years, income, urban) %>% as.matrix()

# If you have factors (recommended approach):
X <- model.matrix(~ age + female + education + income + urban - 1, data = df)
Y <- df$$outcome
W <- df$$treatment

# Estimate causal forest
cf <- causal_forest(X, Y, W,
                    num.trees = 2000,
                    honesty = TRUE,
                    tune.parameters = "all")

# Individual treatment effects (CATEs)
tau_hat <- predict(cf)$$predictions

# Add to data
df$$cate <- tau_hat

# Summary
summary(tau_hat)
hist(tau_hat, main = "Distribution of Estimated Treatment Effects",
     xlab = "CATE", col = "steelblue")

Average Treatment Effect

# ATE with forest-based inference
ate <- average_treatment_effect(cf)
print(ate)
# Output: estimate, std.err

# CATE for specific subgroups
ate_female <- average_treatment_effect(cf,
                                       subset = df$$female == 1)
ate_male <- average_treatment_effect(cf,
                                     subset = df$$female == 0)

Variable Importance

# Which variables drive heterogeneity?
var_importance <- variable_importance(cf)
var_names <- colnames(X)

importance_df <- tibble(
  variable = var_names,
  importance = var_importance
) %>%
  arrange(desc(importance))

ggplot(importance_df, aes(x = reorder(variable, importance), y = importance)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(x = "", y = "Variable Importance",
       title = "Drivers of Treatment Effect Heterogeneity") +
  theme_minimal()

Visualizing Heterogeneity

# CATE by continuous variable
df %>%
  mutate(income_bin = cut(income, breaks = 10)) %>%
  group_by(income_bin) %>%
  summarize(mean_cate = mean(cate),
            se = sd(cate) / sqrt(n()),
            n = n()) %>%
  ggplot(aes(x = income_bin, y = mean_cate)) +
  geom_point() +
  geom_errorbar(aes(ymin = mean_cate - 1.96*se,
                    ymax = mean_cate + 1.96*se), width = 0.2) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(x = "Income Decile", y = "Average CATE",
       title = "Treatment Effect Heterogeneity by Income") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# CATE by binary variable
df %>%
  group_by(female) %>%
  summarize(mean_cate = mean(cate),
            se = sd(cate) / sqrt(n())) %>%
  ggplot(aes(x = factor(female, labels = c("Male", "Female")),
             y = mean_cate)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_cate - 1.96*se,
                    ymax = mean_cate + 1.96*se), width = 0.1) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(x = "", y = "Average CATE") +
  theme_minimal()

Best Linear Projection

# Project CATE onto covariates for interpretable summary
blp <- best_linear_projection(cf, X)
print(blp)

# This gives coefficients interpretable as:
# "A 1-unit increase in X_j is associated with a beta_j change in CATE"

Confidence Intervals for CATEs

# Predict with variance estimates
predictions <- predict(cf, estimate.variance = TRUE)

df$$cate <- predictions$$predictions
df$$cate_var <- predictions$$variance.estimates
df$$cate_se <- sqrt(df$$cate_var)
df$$cate_lower <- df$$cate - 1.96 * df$$cate_se
df$$cate_upper <- df$$cate + 1.96 * df$$cate_se

# What fraction have CIs excluding zero?
mean(df$$cate_lower > 0 | df$$cate_upper < 0)

Causal Forest with Instrumental Variables

# When treatment is endogenous, use instrumental forest
iv_forest <- instrumental_forest(X, Y, W, Z,
                                 num.trees = 2000)

# Local average treatment effects
late_hat <- predict(iv_forest)$$predictions

22.3 Double/Debiased Machine Learning

DML with DoubleML (R)

library(DoubleML)
library(mlr3)
library(mlr3learners)

# Prepare data as DoubleMLData object
dml_data <- DoubleMLData$$new(
  df,
  y_col = "outcome",
  d_cols = "treatment",
  x_cols = c("age", "female", "education", "income", "urban")
)

# Define ML methods for nuisance functions
ml_g <- lrn("regr.ranger", num.trees = 500)  # E[Y|X]
ml_m <- lrn("classif.ranger", num.trees = 500)  # E[D|X]

# Partially Linear Model
dml_plr <- DoubleMLPLR$$new(dml_data, ml_g, ml_m)
dml_plr$$fit()
print(dml_plr)

# Summary
dml_plr$$summary()
dml_plr$$confint()

# With different ML methods
ml_g_lasso <- lrn("regr.cv_glmnet", s = "lambda.min")
ml_m_lasso <- lrn("classif.cv_glmnet", s = "lambda.min")

dml_plr_lasso <- DoubleMLPLR$$new(dml_data, ml_g_lasso, ml_m_lasso)
dml_plr_lasso$$fit()
print(dml_plr_lasso)

DML with EconML (Python)

Python:

from econml.dml import LinearDML, CausalForestDML
from econml.orf import DMLOrthoForest
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.linear_model import LassoCV
import numpy as np
import pandas as pd

# Prepare data - handle categorical variables properly
Y = df['outcome'].values
T = df['treatment'].values

# If all covariates are numeric:
# X = df[['age', 'female', 'income', 'urban']].values

# If you have categorical variables (recommended approach):
X = pd.get_dummies(df[['age', 'female', 'education', 'income', 'urban']],
                   columns=['education'],  # Specify which columns are categorical
                   drop_first=True).values
W = X  # Confounders (can be different from effect modifiers X)

# Linear DML
linear_dml = LinearDML(
    model_y=RandomForestRegressor(n_estimators=100),
    model_t=RandomForestClassifier(n_estimators=100),
    cv=5
)
linear_dml.fit(Y, T, X=X, W=W)

# ATE
print(f"ATE: {linear_dml.ate()}")
print(f"ATE SE: {linear_dml.ate_stderr()}")

# CATE
cate = linear_dml.effect(X)
print(f"Mean CATE: {cate.mean():.3f}")

# Confidence intervals for CATEs
cate_lower, cate_upper = linear_dml.effect_interval(X, alpha=0.05)

Box: Extracting Results from DML Objects

DML objects contain rich information beyond point estimates. Here's how to extract what you need for reporting.

R (DoubleML package):

# After fitting: dml_plr$$fit()

# Point estimate and SE
coef(dml_plr)              # Treatment coefficient
dml_plr$$se                 # Standard error

# Confidence interval
dml_plr$$confint(level = 0.95)

# Full summary (like regression output)
dml_plr$$summary()

# P-value for H0: theta = 0
dml_plr$$pval

# For multiple treatments or repeated cross-fitting
dml_plr$$all_coef           # All coefficient estimates
dml_plr$$all_se             # All standard errors

Python (EconML):

# After fitting: linear_dml.fit(Y, T, X=X, W=W)

# ATE and standard error
ate = linear_dml.ate()
ate_se = linear_dml.ate_stderr()

# Confidence interval for ATE
ate_lower, ate_upper = linear_dml.ate_interval(alpha=0.05)

# For LinearDML: underlying coefficient on treatment
# (when treatment is binary)
print(linear_dml.coef_)      # Shape: (n_features,) for each outcome
print(linear_dml.intercept_)

# CATEs for specific observations
X_test = df_test[['age', 'income']].values
cate_predictions = linear_dml.effect(X_test)
cate_lower, cate_upper = linear_dml.effect_interval(X_test)

# Create results DataFrame for reporting
import pandas as pd
results = pd.DataFrame({
    'observation': range(len(X_test)),
    'cate': cate_predictions.flatten(),
    'ci_lower': cate_lower.flatten(),
    'ci_upper': cate_upper.flatten()
})

Key distinction: ate() gives the average treatment effect across the sample; effect(X) gives conditional effects for specific covariate values. For policy targeting, you typically want effect(X).

Causal Forest DML:

# Combines causal forest with DML
cf_dml = CausalForestDML(
    model_y=RandomForestRegressor(n_estimators=100),
    model_t=RandomForestClassifier(n_estimators=100),
    n_estimators=1000,
    cv=5
)
cf_dml.fit(Y, T, X=X, W=W)

# CATEs with confidence intervals
cate = cf_dml.effect(X)
cate_lower, cate_upper = cf_dml.effect_interval(X, alpha=0.05)

# Feature importance for heterogeneity
importance = cf_dml.feature_importances_

DML for IV

Python with EconML:

from econml.iv.dml import DMLIV, OrthoIV

# DML with instrumental variable
Z = df['instrument'].values

dml_iv = DMLIV(
    model_y_xw=RandomForestRegressor(n_estimators=100),
    model_t_xw=RandomForestRegressor(n_estimators=100),
    model_t_xwz=RandomForestRegressor(n_estimators=100),
    cv=5
)
dml_iv.fit(Y, T, Z=Z, X=X, W=W)

print(f"LATE: {dml_iv.effect().mean()}")

22.4 Simulation for Understanding

Basic Monte Carlo

library(tidyverse)

# Simulate to understand OLS vs IV
simulate_iv <- function(n = 1000, gamma = 0.5, rho = 0.5) {
  # gamma: true effect of D on Y
  # rho: correlation between D and error

  # Generate data
  u <- rnorm(n)
  z <- rnorm(n)  # Instrument
  d <- 0.5 * z + rho * u + rnorm(n, sd = 0.5)  # Endogenous treatment
  y <- gamma * d + u  # Outcome

  data <- tibble(y = y, d = d, z = z)

  # OLS estimate
  ols <- lm(y ~ d, data = data)
  beta_ols <- coef(ols)["d"]

  # IV estimate
  first_stage <- lm(d ~ z, data = data)
  d_hat <- predict(first_stage)
  second_stage <- lm(y ~ d_hat)
  beta_iv <- coef(second_stage)["d_hat"]

  # First stage F
  first_f <- summary(first_stage)$$fstatistic[1]

  tibble(beta_ols = beta_ols, beta_iv = beta_iv, first_f = first_f)
}

# Run simulation
set.seed(42)
n_sims <- 1000

sim_results <- map_dfr(1:n_sims, ~simulate_iv(n = 500, gamma = 0.5, rho = 0.5))

# Summary
sim_results %>%
  summarize(
    across(c(beta_ols, beta_iv),
           list(mean = mean, sd = sd, bias = ~mean(. - 0.5)))
  )

# Visualization
sim_results %>%
  pivot_longer(c(beta_ols, beta_iv), names_to = "estimator", values_to = "estimate") %>%
  ggplot(aes(x = estimate, fill = estimator)) +
  geom_density(alpha = 0.5) +
  geom_vline(xintercept = 0.5, linetype = "dashed") +
  labs(x = "Estimate", fill = "Estimator",
       title = "OLS vs IV: Sampling Distribution") +
  theme_minimal()

Figure 22.1: The bias-variance tradeoff between OLS and IV. OLS (blue) is biased but precise—its distribution is centered above the true effect (green line) but narrow. IV with a strong instrument (red) is unbiased but more variable. IV with a weak instrument (orange) inherits bias toward OLS while remaining imprecise—the worst of both worlds. Summary statistics in the box show the mean and standard deviation for each estimator.

Comparing DiD Estimators

library(did)
library(fixest)

# Simulate staggered DiD
simulate_staggered_did <- function(n_units = 100, n_periods = 10) {
  # Create panel
  df <- expand_grid(unit = 1:n_units, time = 1:n_periods)

  # Assign treatment timing (some never treated)
  treatment_timing <- tibble(
    unit = 1:n_units,
    g = sample(c(0, 4, 6, 8), n_units, replace = TRUE,
               prob = c(0.3, 0.3, 0.2, 0.2))
  )
  df <- left_join(df, treatment_timing, by = "unit")

  # Generate outcome with heterogeneous effects by cohort
  df <- df %>%
    mutate(
      treated = (g > 0) & (time >= g),
      # True effect varies by cohort
      true_effect = case_when(
        g == 0 ~ 0,
        g == 4 ~ 1.0,
        g == 6 ~ 1.5,
        g == 8 ~ 2.0
      ),
      effect = if_else(treated, true_effect * (time - g + 1), 0),
      # Unit and time fixed effects + noise
      y = unit/10 + time/5 + effect + rnorm(n())
    )

  # TWFE estimate (potentially biased)
  twfe <- feols(y ~ treated | unit + time, data = df)
  beta_twfe <- coef(twfe)["treatedTRUE"]

  # Callaway-Sant'Anna
  cs <- att_gt(yname = "y", tname = "time", idname = "unit", gname = "g",
               data = df, control_group = "nevertreated")
  cs_agg <- aggte(cs, type = "simple")
  beta_cs <- cs_agg$$overall.att

  # True ATT
  true_att <- df %>%
    filter(treated) %>%
    summarize(true = mean(true_effect)) %>%
    pull(true)

  tibble(twfe = beta_twfe, cs = beta_cs, true = true_att)
}

# Run simulation
set.seed(123)
staggered_results <- map_dfr(1:200, ~simulate_staggered_did(),
                             .progress = TRUE)

# Compare bias
staggered_results %>%
  mutate(bias_twfe = twfe - true,
         bias_cs = cs - true) %>%
  summarize(
    mean_bias_twfe = mean(bias_twfe),
    mean_bias_cs = mean(bias_cs),
    rmse_twfe = sqrt(mean(bias_twfe^2)),
    rmse_cs = sqrt(mean(bias_cs^2))
  )

Understanding Causal Forest Properties

library(grf)

# Simulate to understand causal forest coverage
simulate_cf_coverage <- function(n = 2000, n_test = 500) {
  # Generate data with known heterogeneous effects
  X <- matrix(runif(n * 5), n, 5)
  W <- rbinom(n, 1, 0.5)

  # True CATE is a function of X1 and X2
  tau <- function(x) 0.5 + 0.5 * x[1] - 0.3 * x[2]
  true_cate <- apply(X, 1, tau)

  # Outcome
  Y <- true_cate * W + X[,1] + X[,2]^2 + rnorm(n)

  # Fit causal forest
  cf <- causal_forest(X, Y, W, num.trees = 1000)

  # Test set
  X_test <- matrix(runif(n_test * 5), n_test, 5)
  true_cate_test <- apply(X_test, 1, tau)

  # Predictions with variance
  pred <- predict(cf, X_test, estimate.variance = TRUE)
  cate_hat <- pred$$predictions
  cate_se <- sqrt(pred$$variance.estimates)

  # Coverage
  covered <- (true_cate_test >= cate_hat - 1.96 * cate_se) &
             (true_cate_test <= cate_hat + 1.96 * cate_se)

  # RMSE
  rmse <- sqrt(mean((cate_hat - true_cate_test)^2))

  tibble(coverage = mean(covered), rmse = rmse)
}

# Run
set.seed(456)
cf_sim <- map_dfr(1:100, ~simulate_cf_coverage(), .progress = TRUE)

cf_sim %>%
  summarize(mean_coverage = mean(coverage),
            se_coverage = sd(coverage) / sqrt(n()),
            mean_rmse = mean(rmse))

Python:

import numpy as np
from econml.dml import CausalForestDML
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier

def simulate_cf(n=2000, n_test=500):
    # Generate data
    X = np.random.uniform(size=(n, 5))
    T = np.random.binomial(1, 0.5, n)

    # True CATE
    true_cate = 0.5 + 0.5 * X[:, 0] - 0.3 * X[:, 1]

    # Outcome
    Y = true_cate * T + X[:, 0] + X[:, 1]**2 + np.random.randn(n)

    # Fit
    cf = CausalForestDML(
        model_y=RandomForestRegressor(n_estimators=100),
        model_t=RandomForestClassifier(n_estimators=100),
        n_estimators=500
    )
    cf.fit(Y, T, X=X, W=X)

    # Test
    X_test = np.random.uniform(size=(n_test, 5))
    true_cate_test = 0.5 + 0.5 * X_test[:, 0] - 0.3 * X_test[:, 1]

    cate_hat = cf.effect(X_test)
    lower, upper = cf.effect_interval(X_test, alpha=0.05)

    # Coverage
    covered = (true_cate_test >= lower) & (true_cate_test <= upper)
    coverage = covered.mean()

    # RMSE
    rmse = np.sqrt(np.mean((cate_hat - true_cate_test)**2))

    return {'coverage': coverage, 'rmse': rmse}

# Run simulation
results = [simulate_cf() for _ in range(50)]
coverage_mean = np.mean([r['coverage'] for r in results])
print(f"Mean coverage: {coverage_mean:.3f}")

Practical Guidance

Package Recommendations

Task

Python

Causal forest

grf

econml

DML

DoubleML

econml, doubleml

General ML

mlr3, caret

scikit-learn

Subgroup analysis

Manual + ggplot2

Manual + matplotlib

When to Use What

Method

Use When

Subgroup analysis

Few pre-specified subgroups

Causal forest

Many potential effect modifiers, want data-driven discovery

DML

High-dimensional confounders, want valid inference

Causal forest + DML

Both heterogeneity and high-dimensional confounding

Common Pitfalls

Pitfall 1: P-Hacking via Subgroup Analysis Testing many subgroups and reporting only significant ones.
How to avoid: Pre-specify subgroups. Adjust for multiple testing. Use causal forests for exploratory analysis.

Pitfall 2: Overfitting CATEs With many covariates and limited data, individual CATEs can be very noisy.
How to avoid: Focus on average effects within groups rather than individual CATEs. Check coverage in simulation.

Pitfall 3: Ignoring Honest Estimation Using same data for tree construction and estimation leads to overfitting.
How to avoid: Use honest = TRUE in grf (default). This splits sample for growing vs. estimating.

Implementation Checklist

Pre-specify key subgroups before analysis
Use appropriate multiple testing corrections
Report uncertainty for CATEs (not just point estimates)
Check variable importance for interpretability
Validate with simulation when possible
Use honest estimation for valid inference

Summary

Key takeaways:

Traditional subgroup analysis should use interaction terms and adjust for multiple testing; causal forests provide data-driven heterogeneity discovery.
grf (R) and EconML (Python) provide state-of-the-art implementations of causal forests and DML with valid inference.
Simulation is essential for understanding method properties---bias, coverage, and power in your specific setting.

Returning to the opening question: Methods for heterogeneous effects and ML-based causal inference require careful implementation. The packages here make sophisticated methods accessible, but understanding when inference is valid requires attention to assumptions. Simulation helps bridge the gap between theoretical properties and practical performance.

Exercises

Conceptual

Explain why grf::causal_forest() uses "honesty" (separate samples for tree construction and estimation). What problem does this solve, and what is the cost?
In Double ML, why must nuisance functions be estimated using cross-fitting rather than on the full sample? What bias would arise otherwise?
A causal forest returns variable importance scores showing that "age" is the most important driver of treatment effect heterogeneity. Does this mean older people benefit more from treatment? Explain what variable importance does and does not tell us.

Applied

Using experimental or quasi-experimental data:
- Estimate CATEs with a causal forest
- Identify which variables drive heterogeneity
- Compare to traditional subgroup analysis
Implement DML in a setting with high-dimensional confounders. Compare estimates using different ML methods for the nuisance functions.
Design and run a Monte Carlo simulation comparing:
- OLS, matching, and DML for estimating ATE
- Under varying degrees of treatment effect heterogeneity and confounding

Discussion

The econml (Python) and grf/DoubleML (R) packages offer overlapping functionality but different interfaces. A researcher comfortable in both languages asks which to use. What factors should guide this decision for (a) a one-off analysis, (b) a production system, and (c) teaching?
Critics argue that ML-based causal inference methods are "black boxes" that obscure what assumptions are being made. Defenders argue they are more honest about functional form uncertainty. Based on your experience implementing these methods, which view do you find more compelling?

PreviousChapter 21: Machine Learning for Causal Inference NextChapter 23: Triangulation and Multi-Method Design

Last updated 1 month ago

Chapter 22: Programming Companion—Beyond Averages

Opening Question

Chapter Overview

22.1 Subgroup Analysis

Traditional Subgroup Analysis

Multiple Testing Corrections

22.2 Causal Forests with grf

Data Preparation for ML Causal Inference

Basic Causal Forest

Average Treatment Effect

Variable Importance

Visualizing Heterogeneity

Best Linear Projection

Confidence Intervals for CATEs

Causal Forest with Instrumental Variables

22.3 Double/Debiased Machine Learning

DML with DoubleML (R)

DML with EconML (Python)

DML for IV

22.4 Simulation for Understanding

Basic Monte Carlo

Comparing DiD Estimators

Understanding Causal Forest Properties

Practical Guidance

Package Recommendations

When to Use What

Common Pitfalls

Implementation Checklist

Summary

Further Reading

Essential

Package Documentation

Applications

Exercises

Conceptual

Applied

Discussion

hashtagOpening Question

hashtagChapter Overview

hashtag22.1 Subgroup Analysis

hashtagTraditional Subgroup Analysis

hashtagMultiple Testing Corrections

hashtag22.2 Causal Forests with grf

hashtagData Preparation for ML Causal Inference

hashtagBasic Causal Forest

hashtagAverage Treatment Effect

hashtagVariable Importance

hashtagVisualizing Heterogeneity

hashtagBest Linear Projection

hashtagConfidence Intervals for CATEs

hashtagCausal Forest with Instrumental Variables

hashtag22.3 Double/Debiased Machine Learning

hashtagDML with DoubleML (R)

hashtagDML with EconML (Python)

hashtagDML for IV

hashtag22.4 Simulation for Understanding

hashtagBasic Monte Carlo

hashtagComparing DiD Estimators

hashtagUnderstanding Causal Forest Properties

hashtagPractical Guidance

hashtagPackage Recommendations

hashtagWhen to Use What

hashtagCommon Pitfalls

hashtagImplementation Checklist

hashtagSummary

hashtagFurther Reading

hashtagEssential

hashtagPackage Documentation

hashtagApplications

hashtagExercises

hashtagConceptual

hashtagApplied

hashtagDiscussion

Opening Question

Chapter Overview

22.1 Subgroup Analysis

Traditional Subgroup Analysis

Multiple Testing Corrections

22.2 Causal Forests with grf

Data Preparation for ML Causal Inference

Basic Causal Forest

Average Treatment Effect

Variable Importance

Visualizing Heterogeneity

Best Linear Projection

Confidence Intervals for CATEs

Causal Forest with Instrumental Variables

22.3 Double/Debiased Machine Learning

DML with DoubleML (R)

DML with EconML (Python)

DML for IV

22.4 Simulation for Understanding

Basic Monte Carlo

Comparing DiD Estimators

Understanding Causal Forest Properties

Practical Guidance

Package Recommendations

When to Use What

Common Pitfalls

Implementation Checklist

Summary

Further Reading

Essential

Package Documentation

Applications

Exercises

Conceptual

Applied

Discussion