Appendix A: Mathematical Notation

This appendix provides a reference for the mathematical notation used throughout the book.


General Conventions

Symbol
Meaning

Lowercase letters (x,y,zx, y, z)

Scalars or realized values of random variables

Uppercase letters (X,Y,ZX, Y, Z)

Random variables

Bold lowercase (x,y\mathbf{x}, \mathbf{y})

Vectors

Bold uppercase (X,Y\mathbf{X}, \mathbf{Y})

Matrices

Greek letters (α,β,θ\alpha, \beta, \theta)

Parameters

Hat (θ^\hat{\theta})

Estimator or estimate

Bar (Xˉ\bar{X})

Sample mean

Tilde (X~\tilde{X})

Transformed or residualized variable


Subscripts and Superscripts

Notation
Meaning

YiY_i

Outcome for unit ii

YitY_{it}

Outcome for unit ii at time tt

XkX_k

The kk-th covariate

βj\beta_j

Coefficient on variable jj

Y(1),Y(2)Y^{(1)}, Y^{(2)}

Different outcomes or transformations

XX'

Transpose of matrix/vector XX

X1X^{-1}

Inverse of matrix XX


Treatment and Potential Outcomes

Symbol
Meaning

DD or DiD_i

Treatment indicator (1 = treated, 0 = control)

Y(1)Y(1)

Potential outcome under treatment

Y(0)Y(0)

Potential outcome under control

Y(d)Y(d)

Potential outcome under treatment status dd

τi=Yi(1)Yi(0)\tau_i = Y_i(1) - Y_i(0)

Individual treatment effect

ZZ

Instrument


Treatment Effect Estimands

Symbol
Definition
Name

ATE\text{ATE}

E[Y(1)Y(0)]E[Y(1) - Y(0)]

Average Treatment Effect

ATT\text{ATT}

E[Y(1)Y(0)D=1]E[Y(1) - Y(0) \mid D = 1]

Average Treatment Effect on the Treated

ATU\text{ATU}

E[Y(1)Y(0)D=0]E[Y(1) - Y(0) \mid D = 0]

Average Treatment Effect on the Untreated

LATE\text{LATE}

E[Y(1)Y(0)Compliers]E[Y(1) - Y(0) \mid \text{Compliers}]

Local Average Treatment Effect

CATE(x)\text{CATE}(x)

E[Y(1)Y(0)X=x]E[Y(1) - Y(0) \mid X = x]

Conditional Average Treatment Effect

τ(x)\tau(x)

Shorthand for CATE(x)\text{CATE}(x)

Treatment effect function


Probability and Expectation

Symbol
Meaning

P(A)P(A)

Probability of event AA

P(AB)P(A \mid B)

Conditional probability of AA given BB

E[X]E[X]

Expected value of XX

E[XY]E[X \mid Y]

Conditional expectation of XX given YY

Var(X)\text{Var}(X)

Variance of XX

Cov(X,Y)\text{Cov}(X, Y)

Covariance of XX and YY

Corr(X,Y)\text{Corr}(X, Y)

Correlation of XX and YY

σ2\sigma^2

Variance (population)

s2s^2

Variance (sample)

σXY\sigma_{XY}

Covariance between XX and YY


Distributions

Notation
Meaning

XFX \sim F

XX is distributed according to FF

XN(μ,σ2)X \sim N(\mu, \sigma^2)

Normal distribution with mean μ\mu, variance σ2\sigma^2

XBernoulli(p)X \sim \text{Bernoulli}(p)

Bernoulli with probability pp

XBinomial(n,p)X \sim \text{Binomial}(n, p)

Binomial distribution

Xχk2X \sim \chi^2_k

Chi-squared with kk degrees of freedom

XtkX \sim t_k

Student's t with kk degrees of freedom

XFk1,k2X \sim F_{k_1, k_2}

F-distribution

Φ()\Phi(\cdot)

Standard normal CDF

ϕ()\phi(\cdot)

Standard normal PDF


Convergence

Notation
Meaning

p\xrightarrow{p}

Convergence in probability

d\xrightarrow{d}

Convergence in distribution

a.s.\xrightarrow{a.s.}

Almost sure convergence

=Op(1)= O_p(1)

Bounded in probability

=op(1)= o_p(1)

Converges to zero in probability

\approx

Approximately equal

\propto

Proportional to


Regression and Estimation

Symbol
Meaning

β\beta

Coefficient vector

β^\hat{\beta}

OLS (or other) estimator

β^OLS\hat{\beta}_{OLS}

OLS estimator specifically

β^IV\hat{\beta}_{IV} or β^2SLS\hat{\beta}_{2SLS}

IV/2SLS estimator

ε\varepsilon, uu, ee

Error terms (various contexts)

ε^\hat{\varepsilon}, u^\hat{u}, e^\hat{e}

Residuals

R2R^2

R-squared (coefficient of determination)

Rˉ2\bar{R}^2

Adjusted R-squared

SE(β^)\text{SE}(\hat{\beta})

Standard error of estimator


Propensity Scores and Weights

Symbol
Meaning

e(X)e(X) or p(X)p(X)

Propensity score: P(D=1X)P(D = 1 \mid X)

e^(X)\hat{e}(X)

Estimated propensity score

wiw_i

Weight for observation ii

IPW\text{IPW}

Inverse probability weighting

AIPW\text{AIPW}

Augmented inverse probability weighting


Panel Data and Time Series

Symbol
Meaning

ii

Cross-sectional unit index

tt

Time period index

TT

Number of time periods

NN

Number of cross-sectional units

αi\alpha_i

Unit fixed effect

γt\gamma_t or λt\lambda_t

Time fixed effect

LL

Lag operator: LYt=Yt1LY_t = Y_{t-1}

Δ\Delta

First difference operator: ΔYt=YtYt1\Delta Y_t = Y_t - Y_{t-1}


Difference-in-Differences

Symbol
Meaning

Postt\text{Post}_t

Post-treatment indicator

Treati\text{Treat}_i

Treatment group indicator

δ\delta or τ\tau

DiD treatment effect

Yit(0)Y_{it}(0), Yit(1)Y_{it}(1)

Potential outcomes in panel setting


Regression Discontinuity

Symbol
Meaning

XX or RR

Running variable

cc

Cutoff value

hh

Bandwidth

τRD\tau_{RD}

RD treatment effect at cutoff


Instrumental Variables

Symbol
Meaning

ZZ

Instrument(s)

π\pi

First-stage coefficient

ρ\rho

Reduced-form coefficient

β=ρ/π\beta = \rho / \pi

Wald estimator


Matrix Notation

Symbol
Meaning

X\mathbf{X}

n×kn \times k matrix of regressors

y\mathbf{y}

n×1n \times 1 outcome vector

In\mathbf{I}_n

n×nn \times n identity matrix

0\mathbf{0}

Zero vector or matrix

1\mathbf{1}

Vector of ones

tr(A)\text{tr}(\mathbf{A})

Trace of matrix A\mathbf{A}

$$

\mathbf{A}

rank(A)\text{rank}(\mathbf{A})

Rank of matrix


Independence and Conditional Independence

Notation
Meaning

XYX \perp Y

XX is independent of YY

XYZX \perp Y \mid Z

XX is independent of YY conditional on ZZ

(Y(0),Y(1))DX(Y(0), Y(1)) \perp D \mid X

Conditional independence assumption


Indicators and Sets

Notation
Meaning

1{A}\mathbf{1}\{A\} or I(A)\mathbb{I}(A)

Indicator function (1 if AA true, 0 otherwise)

{x:condition}\{x : \text{condition}\}

Set of xx satisfying condition

\in

Element of

\subset

Subset of

\cup, \cap

Union, intersection

\emptyset

Empty set

R\mathbb{R}

Real numbers

Rk\mathbb{R}^k

kk-dimensional real space


Summation and Products

Notation
Meaning

i=1nxi\sum_{i=1}^n x_i

Sum from i=1i = 1 to nn

i=1nxi\prod_{i=1}^n x_i

Product from i=1i = 1 to nn

xˉ=1ni=1nxi\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i

Sample mean


Limits and Asymptotics

Notation
Meaning

limn\lim_{n \to \infty}

Limit as nn approaches infinity

plim\text{plim}

Probability limit

nn \to \infty

Sample size goes to infinity

n(θ^θ)\sqrt{n}(\hat{\theta} - \theta)

Scaled deviation (for CLT)


Calculus and Optimization

Notation
Meaning

fx\frac{\partial f}{\partial x}

Partial derivative

f\nabla f

Gradient

2fx2\frac{\partial^2 f}{\partial x^2}

Second derivative

H\mathbf{H}

Hessian matrix

argmaxxf(x)\arg\max_x f(x)

Value of xx that maximizes ff

argminxf(x)\arg\min_x f(x)

Value of xx that minimizes ff


Chapter-Specific Notation

DAGs (Chapter 9)

  • \to: Direct causal effect

  • \leftarrow: Reverse causal direction

  • XMYX \to M \to Y: Chain (mediation)

  • XCYX \leftarrow C \to Y: Fork (confounding)

  • XCYX \to C \leftarrow Y: Collider

Meta-Analysis (Chapter 24)

  • θi\theta_i: True effect in study ii

  • θ^i\hat{\theta}_i: Estimated effect in study ii

  • τ2\tau^2: Between-study variance

  • I2I^2: Heterogeneity measure

Machine Learning (Chapter 21)

  • η\eta: Nuisance parameters

  • ψ\psi: Influence function/moment condition

  • μ^(x)\hat{\mu}(x): Estimated outcome function

  • g^(x)\hat{g}(x): Estimated propensity score


For notation not listed here, consult the chapter where the symbol is first introduced.

Last updated