Chapter 22: Programming Companion—Beyond Averages

Opening Question

How do we implement methods for heterogeneous treatment effects and machine learning-based causal inference?


Chapter Overview

This chapter provides practical implementations of the methods from Part IV: mechanisms, heterogeneity, and machine learning for causal inference. We focus on three areas: subgroup analysis and visualization, machine learning methods for heterogeneous treatment effects (causal forests, double/debiased ML), and simulation for understanding estimator properties.

These methods have become increasingly important as researchers move beyond average effects to understand who benefits from interventions and why. The packages covered here---grf, EconML, DoubleML---represent the state of the art in causal machine learning.

What you will learn:

  • How to conduct and visualize subgroup analysis

  • How to estimate heterogeneous treatment effects with causal forests

  • How to implement double/debiased machine learning

  • How to use simulation to understand method properties

Prerequisites: Chapters 19-21 (conceptual foundations), Chapters 4 and 18 (programming basics)


22.1 Subgroup Analysis

Traditional Subgroup Analysis

R:

Visualizing subgroup effects:

Python:

Multiple Testing Corrections

R:


22.2 Causal Forests with grf

Data Preparation for ML Causal Inference

Critical: Preprocessing Categorical Variables

Machine learning tools like grf (R) and EconML (Python) require numeric matrices as input. Categorical variables (strings, factors) must be converted before use.

R: Use model.matrix() to create dummy variables:

Python: Use pd.get_dummies() or sklearn's OneHotEncoder:

Common mistake: Treating numeric-coded categories (1, 2, 3) as continuous. If "education" is coded 1=HS, 2=College, 3=Graduate, the model thinks 3 is "more" than 2. Create dummies instead.

Basic Causal Forest

R with grf:

Average Treatment Effect

Variable Importance

Visualizing Heterogeneity

Best Linear Projection

Confidence Intervals for CATEs

Causal Forest with Instrumental Variables


22.3 Double/Debiased Machine Learning

DML with DoubleML (R)

R:

DML with EconML (Python)

Python:

Box: Extracting Results from DML Objects

DML objects contain rich information beyond point estimates. Here's how to extract what you need for reporting.

R (DoubleML package):

Python (EconML):

Key distinction: ate() gives the average treatment effect across the sample; effect(X) gives conditional effects for specific covariate values. For policy targeting, you typically want effect(X).

Causal Forest DML:

DML for IV

Python with EconML:


22.4 Simulation for Understanding

Basic Monte Carlo

R:

OLS vs IV Simulation Figure 22.1: The bias-variance tradeoff between OLS and IV. OLS (blue) is biased but precise—its distribution is centered above the true effect (green line) but narrow. IV with a strong instrument (red) is unbiased but more variable. IV with a weak instrument (orange) inherits bias toward OLS while remaining imprecise—the worst of both worlds. Summary statistics in the box show the mean and standard deviation for each estimator.

Comparing DiD Estimators

Understanding Causal Forest Properties

Python:


Practical Guidance

Package Recommendations

Task
R
Python

Causal forest

grf

econml

DML

DoubleML

econml, doubleml

General ML

mlr3, caret

scikit-learn

Subgroup analysis

Manual + ggplot2

Manual + matplotlib

When to Use What

Method
Use When

Subgroup analysis

Few pre-specified subgroups

Causal forest

Many potential effect modifiers, want data-driven discovery

DML

High-dimensional confounders, want valid inference

Causal forest + DML

Both heterogeneity and high-dimensional confounding

Common Pitfalls

Pitfall 1: P-Hacking via Subgroup Analysis Testing many subgroups and reporting only significant ones.

How to avoid: Pre-specify subgroups. Adjust for multiple testing. Use causal forests for exploratory analysis.

Pitfall 2: Overfitting CATEs With many covariates and limited data, individual CATEs can be very noisy.

How to avoid: Focus on average effects within groups rather than individual CATEs. Check coverage in simulation.

Pitfall 3: Ignoring Honest Estimation Using same data for tree construction and estimation leads to overfitting.

How to avoid: Use honest = TRUE in grf (default). This splits sample for growing vs. estimating.

Implementation Checklist


Summary

Key takeaways:

  1. Traditional subgroup analysis should use interaction terms and adjust for multiple testing; causal forests provide data-driven heterogeneity discovery.

  2. grf (R) and EconML (Python) provide state-of-the-art implementations of causal forests and DML with valid inference.

  3. Simulation is essential for understanding method properties---bias, coverage, and power in your specific setting.

Returning to the opening question: Methods for heterogeneous effects and ML-based causal inference require careful implementation. The packages here make sophisticated methods accessible, but understanding when inference is valid requires attention to assumptions. Simulation helps bridge the gap between theoretical properties and practical performance.


Further Reading

Essential

  • Athey, S. and S. Wager (2019). "Estimating Treatment Effects with Causal Forests." Journal of the American Statistical Association.

  • Chernozhukov, V. et al. (2018). "Double/Debiased Machine Learning." Econometrics Journal.

Package Documentation

  • grf: https://grf-labs.github.io/grf/

  • EconML: https://econml.azurewebsites.net/

  • DoubleML: https://docs.doubleml.org/

Applications

  • Davis, J. and S. Heller (2017). "Using Causal Forests to Predict Treatment Heterogeneity." AER Papers & Proceedings.


Exercises

Conceptual

  1. Explain why grf::causal_forest() uses "honesty" (separate samples for tree construction and estimation). What problem does this solve, and what is the cost?

  2. In Double ML, why must nuisance functions be estimated using cross-fitting rather than on the full sample? What bias would arise otherwise?

  3. A causal forest returns variable importance scores showing that "age" is the most important driver of treatment effect heterogeneity. Does this mean older people benefit more from treatment? Explain what variable importance does and does not tell us.

Applied

  1. Using experimental or quasi-experimental data:

    • Estimate CATEs with a causal forest

    • Identify which variables drive heterogeneity

    • Compare to traditional subgroup analysis

  2. Implement DML in a setting with high-dimensional confounders. Compare estimates using different ML methods for the nuisance functions.

  3. Design and run a Monte Carlo simulation comparing:

    • OLS, matching, and DML for estimating ATE

    • Under varying degrees of treatment effect heterogeneity and confounding

Discussion

  1. The econml (Python) and grf/DoubleML (R) packages offer overlapping functionality but different interfaces. A researcher comfortable in both languages asks which to use. What factors should guide this decision for (a) a one-off analysis, (b) a production system, and (c) teaching?

  2. Critics argue that ML-based causal inference methods are "black boxes" that obscure what assumptions are being made. Defenders argue they are more honest about functional form uncertainty. Based on your experience implementing these methods, which view do you find more compelling?

Last updated