Chapter 22: Programming Companion—Beyond Averages
Opening Question
How do we implement methods for heterogeneous treatment effects and machine learning-based causal inference?
Chapter Overview
This chapter provides practical implementations of the methods from Part IV: mechanisms, heterogeneity, and machine learning for causal inference. We focus on three areas: subgroup analysis and visualization, machine learning methods for heterogeneous treatment effects (causal forests, double/debiased ML), and simulation for understanding estimator properties.
These methods have become increasingly important as researchers move beyond average effects to understand who benefits from interventions and why. The packages covered here---grf, EconML, DoubleML---represent the state of the art in causal machine learning.
What you will learn:
How to conduct and visualize subgroup analysis
How to estimate heterogeneous treatment effects with causal forests
How to implement double/debiased machine learning
How to use simulation to understand method properties
Prerequisites: Chapters 19-21 (conceptual foundations), Chapters 4 and 18 (programming basics)
22.1 Subgroup Analysis
Traditional Subgroup Analysis
R:
Visualizing subgroup effects:
Python:
Multiple Testing Corrections
R:
22.2 Causal Forests with grf
Data Preparation for ML Causal Inference
Critical: Preprocessing Categorical Variables
Machine learning tools like
grf(R) andEconML(Python) require numeric matrices as input. Categorical variables (strings, factors) must be converted before use.R: Use
model.matrix()to create dummy variables:Python: Use
pd.get_dummies()or sklearn'sOneHotEncoder:Common mistake: Treating numeric-coded categories (1, 2, 3) as continuous. If "education" is coded 1=HS, 2=College, 3=Graduate, the model thinks 3 is "more" than 2. Create dummies instead.
Basic Causal Forest
R with grf:
Average Treatment Effect
Variable Importance
Visualizing Heterogeneity
Best Linear Projection
Confidence Intervals for CATEs
Causal Forest with Instrumental Variables
22.3 Double/Debiased Machine Learning
DML with DoubleML (R)
R:
DML with EconML (Python)
Python:
Box: Extracting Results from DML Objects
DML objects contain rich information beyond point estimates. Here's how to extract what you need for reporting.
R (DoubleML package):
Python (EconML):
Key distinction:
ate()gives the average treatment effect across the sample;effect(X)gives conditional effects for specific covariate values. For policy targeting, you typically wanteffect(X).
Causal Forest DML:
DML for IV
Python with EconML:
22.4 Simulation for Understanding
Basic Monte Carlo
R:
Figure 22.1: The bias-variance tradeoff between OLS and IV. OLS (blue) is biased but precise—its distribution is centered above the true effect (green line) but narrow. IV with a strong instrument (red) is unbiased but more variable. IV with a weak instrument (orange) inherits bias toward OLS while remaining imprecise—the worst of both worlds. Summary statistics in the box show the mean and standard deviation for each estimator.
Comparing DiD Estimators
Understanding Causal Forest Properties
Python:
Practical Guidance
Package Recommendations
Causal forest
grf
econml
DML
DoubleML
econml, doubleml
General ML
mlr3, caret
scikit-learn
Subgroup analysis
Manual + ggplot2
Manual + matplotlib
When to Use What
Subgroup analysis
Few pre-specified subgroups
Causal forest
Many potential effect modifiers, want data-driven discovery
DML
High-dimensional confounders, want valid inference
Causal forest + DML
Both heterogeneity and high-dimensional confounding
Common Pitfalls
Pitfall 1: P-Hacking via Subgroup Analysis Testing many subgroups and reporting only significant ones.
How to avoid: Pre-specify subgroups. Adjust for multiple testing. Use causal forests for exploratory analysis.
Pitfall 2: Overfitting CATEs With many covariates and limited data, individual CATEs can be very noisy.
How to avoid: Focus on average effects within groups rather than individual CATEs. Check coverage in simulation.
Pitfall 3: Ignoring Honest Estimation Using same data for tree construction and estimation leads to overfitting.
How to avoid: Use honest = TRUE in grf (default). This splits sample for growing vs. estimating.
Implementation Checklist
Summary
Key takeaways:
Traditional subgroup analysis should use interaction terms and adjust for multiple testing; causal forests provide data-driven heterogeneity discovery.
grf (R) and EconML (Python) provide state-of-the-art implementations of causal forests and DML with valid inference.
Simulation is essential for understanding method properties---bias, coverage, and power in your specific setting.
Returning to the opening question: Methods for heterogeneous effects and ML-based causal inference require careful implementation. The packages here make sophisticated methods accessible, but understanding when inference is valid requires attention to assumptions. Simulation helps bridge the gap between theoretical properties and practical performance.
Further Reading
Essential
Athey, S. and S. Wager (2019). "Estimating Treatment Effects with Causal Forests." Journal of the American Statistical Association.
Chernozhukov, V. et al. (2018). "Double/Debiased Machine Learning." Econometrics Journal.
Package Documentation
grf: https://grf-labs.github.io/grf/
EconML: https://econml.azurewebsites.net/
DoubleML: https://docs.doubleml.org/
Applications
Davis, J. and S. Heller (2017). "Using Causal Forests to Predict Treatment Heterogeneity." AER Papers & Proceedings.
Exercises
Conceptual
Explain why
grf::causal_forest()uses "honesty" (separate samples for tree construction and estimation). What problem does this solve, and what is the cost?In Double ML, why must nuisance functions be estimated using cross-fitting rather than on the full sample? What bias would arise otherwise?
A causal forest returns variable importance scores showing that "age" is the most important driver of treatment effect heterogeneity. Does this mean older people benefit more from treatment? Explain what variable importance does and does not tell us.
Applied
Using experimental or quasi-experimental data:
Estimate CATEs with a causal forest
Identify which variables drive heterogeneity
Compare to traditional subgroup analysis
Implement DML in a setting with high-dimensional confounders. Compare estimates using different ML methods for the nuisance functions.
Design and run a Monte Carlo simulation comparing:
OLS, matching, and DML for estimating ATE
Under varying degrees of treatment effect heterogeneity and confounding
Discussion
The
econml(Python) andgrf/DoubleML(R) packages offer overlapping functionality but different interfaces. A researcher comfortable in both languages asks which to use. What factors should guide this decision for (a) a one-off analysis, (b) a production system, and (c) teaching?Critics argue that ML-based causal inference methods are "black boxes" that obscure what assumptions are being made. Defenders argue they are more honest about functional form uncertainty. Based on your experience implementing these methods, which view do you find more compelling?
Last updated