Appendix B: Probability Review

This appendix reviews probability concepts used throughout the book. For a comprehensive treatment, consult a probability textbook such as DeGroot & Schervish (2012) or Casella & Berger (2002).

B.1 Probability Fundamentals

Sample Space and Events

The sample space $\Omega$ is the set of all possible outcomes. An event is a subset of the sample space.

Example: Flipping a coin twice:

Sample space: $\Omega = \{HH, HT, TH, TT\}$
Event "at least one head": $A = \{HH, HT, TH\}$

Probability Axioms (Kolmogorov)

For any event $A$ :

$P(A) \geq 0$ (non-negativity)
$P(\Omega) = 1$ (normalization)
If $A_1, A_2, \ldots$ are mutually exclusive: $P(\cup_i A_i) = \sum_i P(A_i)$ (countable additivity)

Implications

$P(\emptyset) = 0$
$P(A^c) = 1 - P(A)$
If $A \subset B$ , then $P(A) \leq P(B)$
$P(A \cup B) = P(A) + P(B) - P(A \cap B)$

B.2 Conditional Probability

Definition

$P(A \mid B) = \frac{P(A \cap B)}{P(B)}, \quad \text{provided } P(B) > 0$

Intuition: Probability of $A$ , restricting attention to outcomes where $B$ occurred.

Multiplication Rule

$P(A \cap B) = P(A \mid B) \cdot P(B) = P(B \mid A) \cdot P(A)$

Law of Total Probability

If $B_1, B_2, \ldots, B_k$ partition $\Omega$ :

$P(A) = \sum_{j=1}^k P(A \mid B_j) P(B_j)$

Bayes' Theorem

$P(B \mid A) = \frac{P(A \mid B) P(B)}{P(A)} = \frac{P(A \mid B) P(B)}{\sum_j P(A \mid B_j) P(B_j)}$

Intuition: Updates prior beliefs $P(B)$ to posterior $P(B \mid A)$ based on evidence $A$ .

B.3 Independence

Definition

Events $A$ and $B$ are independent if:

$P(A \cap B) = P(A) \cdot P(B)$

Equivalently: $P(A \mid B) = P(A)$ (knowing $B$ doesn't change probability of $A$ ).

Conditional Independence

$A$ and $B$ are conditionally independent given $C$ if:

$P(A \cap B \mid C) = P(A \mid C) \cdot P(B \mid C)$

Notation: $A \perp B \mid C$

Caution: Independence does not imply conditional independence, and vice versa.

B.4 Random Variables

Discrete Random Variables

A discrete random variable $X$ takes countably many values. Characterized by:

Probability mass function (PMF): $p_X(x) = P(X = x)$
Properties: $p_X(x) \geq 0$ and $\sum_x p_X(x) = 1$

Continuous Random Variables

A continuous random variable $X$ has:

Probability density function (PDF): $f_X(x)$
Properties: $f_X(x) \geq 0$ and $\int_{-\infty}^{\infty} f_X(x) dx = 1$
$P(a \leq X \leq b) = \int_a^b f_X(x) dx$

Cumulative Distribution Function (CDF)

For any random variable:

$F_X(x) = P(X \leq x)$

Properties:

$F_X(-\infty) = 0$ , $F_X(\infty) = 1$
Non-decreasing
Right-continuous

B.5 Expectation

Definition

Discrete: $E[X] = \sum_x x \cdot p_X(x)$

Continuous: $E[X] = \int_{-\infty}^{\infty} x \cdot f_X(x) dx$

Properties

Linearity: $E[aX + bY] = aE[X] + bE[Y]$
Function of RV: $E[g(X)] = \sum_x g(x) p_X(x)$ or $\int g(x) f_X(x) dx$
Independence: If $X \perp Y$ , then $E[XY] = E[X] \cdot E[Y]$

Conditional Expectation

$E[Y \mid X = x] = \sum_y y \cdot P(Y = y \mid X = x)$

Law of Iterated Expectations (LIE):

$E[Y] = E[E[Y \mid X]]$

Intuition: Average over conditional means equals unconditional mean.

B.6 Variance and Covariance

Variance

$\text{Var}(X) = E[(X - E[X])^2] = E[X^2] - (E[X])^2$

Properties:

$\text{Var}(X) \geq 0$
$\text{Var}(aX + b) = a^2 \text{Var}(X)$
If $X \perp Y$ : $\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)$

Standard deviation: $\text{SD}(X) = \sqrt{\text{Var}(X)}$

Covariance

$\text{Cov}(X, Y) = E[(X - E[X])(Y - E[Y])] = E[XY] - E[X]E[Y]$

Properties:

$\text{Cov}(X, X) = \text{Var}(X)$
$\text{Cov}(X, Y) = \text{Cov}(Y, X)$
$\text{Cov}(aX, bY) = ab \cdot \text{Cov}(X, Y)$
If $X \perp Y$ : $\text{Cov}(X, Y) = 0$ (but not conversely!)

Correlation

$\text{Corr}(X, Y) = \frac{\text{Cov}(X, Y)}{\text{SD}(X) \cdot \text{SD}(Y)}$

Always between $-1$ and $1$
$|\text{Corr}(X, Y)| = 1$ iff $Y = aX + b$ for some $a, b$

Variance of a Sum

$\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2\text{Cov}(X, Y)$

B.7 Common Distributions

Discrete Distributions

Distribution

PMF

Mean

Variance

Bernoulli( $p$ )

$p^x(1-p)^{1-x}$ , $x \in \{0,1\}$

$p$

$p(1-p)$

Binomial( $n, p$ )

$\binom{n}{x}p^x(1-p)^{n-x}$

$np$

$np(1-p)$

Poisson( $\lambda$ )

$\frac{\lambda^x e^{-\lambda}}{x!}$

$\lambda$

Geometric( $p$ )

$(1-p)^{x-1}p$

$1/p$

$(1-p)/p^2$

Continuous Distributions

Distribution

PDF

Mean

Variance

Uniform( $a, b$ )

$\frac{1}{b-a}$ , $x \in [a,b]$

$\frac{a+b}{2}$

$\frac{(b-a)^2}{12}$

Exponential( $\lambda$ )

$\lambda e^{-\lambda x}$ , $x \geq 0$

$1/\lambda$

$1/\lambda^2$

Normal( $\mu, \sigma^2$ )

$\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$

$\mu$

$\sigma^2$

Chi-squared( $k$ )

[complex]

$k$

$2k$

Student's $t(k)$

[complex]

$0$ (if $k>1$ )

$\frac{k}{k-2}$ (if $k>2$ )

The Normal Distribution

The normal (Gaussian) distribution is central to statistics:

$X \sim N(\mu, \sigma^2) \implies f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$

Standard normal: $Z \sim N(0, 1)$

Standardization: If $X \sim N(\mu, \sigma^2)$ , then $Z = \frac{X - \mu}{\sigma} \sim N(0, 1)$

Linear combinations: If $X \sim N(\mu_X, \sigma_X^2)$ and $Y \sim N(\mu_Y, \sigma_Y^2)$ are independent: $aX + bY \sim N(a\mu_X + b\mu_Y, a^2\sigma_X^2 + b^2\sigma_Y^2)$

B.8 Sampling and the Central Limit Theorem

Random Sample

A random sample $X_1, X_2, \ldots, X_n$ consists of independent and identically distributed (i.i.d.) draws from some distribution.

Sample Mean

$\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i$

Properties (if $E[X_i] = \mu$ , $\text{Var}(X_i) = \sigma^2$ ):

$E[\bar{X}] = \mu$ (unbiased)
$\text{Var}(\bar{X}) = \sigma^2/n$
$\text{SE}(\bar{X}) = \sigma/\sqrt{n}$

Law of Large Numbers (LLN)

As $n \to \infty$ :

$\bar{X}_n \xrightarrow{p} \mu$

Interpretation: Sample mean converges to population mean.

Central Limit Theorem (CLT)

For i.i.d. $X_i$ with mean $\mu$ and variance $\sigma^2$ :

$\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} N(0, 1) \quad \text{as } n \to \infty$

Equivalently: $\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} N(0, \sigma^2)$

Interpretation: Sample means are approximately normal for large $n$ , regardless of the population distribution.

B.9 Joint Distributions

Joint PMF/PDF

Discrete: $p_{X,Y}(x, y) = P(X = x, Y = y)$

Continuous: $f_{X,Y}(x, y)$ where $P((X, Y) \in A) = \iint_A f_{X,Y}(x,y) \, dx \, dy$

Marginal Distributions

$f_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x, y) \, dy$

Conditional Distributions

$f_{Y|X}(y \mid x) = \frac{f_{X,Y}(x, y)}{f_X(x)}$

Independence

$X$ and $Y$ are independent iff: $f_{X,Y}(x, y) = f_X(x) \cdot f_Y(y) \quad \text{for all } x, y$

B.10 Transformations of Random Variables

Univariate

If $Y = g(X)$ and $g$ is monotonic with inverse $g^{-1}$ :

$f_Y(y) = f_X(g^{-1}(y)) \left| \frac{d}{dy} g^{-1}(y) \right|$

Linear Transformation

If $Y = aX + b$ :

$E[Y] = aE[X] + b$
$\text{Var}(Y) = a^2 \text{Var}(X)$

B.11 Moment Generating Functions

Definition

$M_X(t) = E[e^{tX}]$

Properties

If exists in neighborhood of 0, uniquely determines distribution
$E[X^k] = M_X^{(k)}(0)$ (k-th derivative at 0)
If $X \perp Y$ : $M_{X+Y}(t) = M_X(t) \cdot M_Y(t)$

Common MGFs

Distribution

MGF

Bernoulli( $p$ )

$(1-p) + pe^t$

Normal( $\mu, \sigma^2$ )

$\exp(\mu t + \sigma^2 t^2/2)$

Poisson( $\lambda$ )

$\exp(\lambda(e^t - 1))$

B.12 Inequalities

Markov's Inequality

For $X \geq 0$ and $a > 0$ : $P(X \geq a) \leq \frac{E[X]}{a}$

Chebyshev's Inequality

For any $k > 0$ : $P(|X - \mu| \geq k\sigma) \leq \frac{1}{k^2}$

Cauchy-Schwarz Inequality

$|E[XY]|^2 \leq E[X^2] \cdot E[Y^2]$

Implies: $|\text{Corr}(X, Y)| \leq 1$

Jensen's Inequality

For convex $g$ : $g(E[X]) \leq E[g(X)]$

For concave $g$ : inequality reverses.

B.13 Convergence Concepts

Types of Convergence

Almost sure convergence: $X_n \xrightarrow{a.s.} X$ if $P(\lim_{n \to \infty} X_n = X) = 1$
Convergence in probability: $X_n \xrightarrow{p} X$ if for all $\epsilon > 0$ : $\lim_{n \to \infty} P(|X_n - X| > \epsilon) = 0$
Convergence in distribution: $X_n \xrightarrow{d} X$ if: $\lim_{n \to \infty} F_{X_n}(x) = F_X(x) \quad \text{at all continuity points}$

Relationships

$\text{a.s.} \implies \text{probability} \implies \text{distribution}$

(Implications don't reverse in general)

Slutsky's Theorem

If $X_n \xrightarrow{d} X$ and $Y_n \xrightarrow{p} c$ (constant):

$X_n + Y_n \xrightarrow{d} X + c$
$X_n Y_n \xrightarrow{d} cX$
$X_n / Y_n \xrightarrow{d} X/c$ (if $c \neq 0$ )

Continuous Mapping Theorem

If $X_n \xrightarrow{d} X$ and $g$ is continuous: $g(X_n) \xrightarrow{d} g(X)$

Delta Method

If $\sqrt{n}(X_n - \theta) \xrightarrow{d} N(0, \sigma^2)$ and $g'(\theta) \neq 0$ : $\sqrt{n}(g(X_n) - g(\theta)) \xrightarrow{d} N(0, [g'(\theta)]^2 \sigma^2)$

hashtagB.1 Probability Fundamentals

hashtagSample Space and Events

hashtagProbability Axioms (Kolmogorov)

hashtagImplications

hashtagB.2 Conditional Probability

hashtagDefinition

hashtagMultiplication Rule

hashtagLaw of Total Probability

hashtagBayes' Theorem

hashtagB.3 Independence

hashtagDefinition

hashtagConditional Independence

hashtagB.4 Random Variables

hashtagDiscrete Random Variables

hashtagContinuous Random Variables

hashtagCumulative Distribution Function (CDF)

hashtagB.5 Expectation

hashtagDefinition

hashtagProperties

hashtagConditional Expectation

hashtagB.6 Variance and Covariance

hashtagVariance

hashtagCovariance

hashtagCorrelation

hashtagVariance of a Sum

hashtagB.7 Common Distributions

hashtagDiscrete Distributions

hashtagContinuous Distributions

hashtagThe Normal Distribution

hashtagB.8 Sampling and the Central Limit Theorem

hashtagRandom Sample

hashtagSample Mean

hashtagLaw of Large Numbers (LLN)

hashtagCentral Limit Theorem (CLT)

hashtagB.9 Joint Distributions

hashtagJoint PMF/PDF

hashtagMarginal Distributions

hashtagConditional Distributions

hashtagIndependence

hashtagB.10 Transformations of Random Variables

hashtagUnivariate

hashtagLinear Transformation

hashtagB.11 Moment Generating Functions

hashtagDefinition

hashtagProperties

hashtagCommon MGFs

hashtagB.12 Inequalities

hashtagMarkov's Inequality

hashtagChebyshev's Inequality

hashtagCauchy-Schwarz Inequality

hashtagJensen's Inequality

hashtagB.13 Convergence Concepts

hashtagTypes of Convergence

hashtagRelationships

hashtagSlutsky's Theorem

hashtagContinuous Mapping Theorem

hashtagDelta Method

hashtagFurther Reading

B.1 Probability Fundamentals

Sample Space and Events

Probability Axioms (Kolmogorov)

Implications

B.2 Conditional Probability

Definition

Multiplication Rule

Law of Total Probability

Bayes' Theorem

B.3 Independence

Definition

Conditional Independence

B.4 Random Variables

Discrete Random Variables

Continuous Random Variables

Cumulative Distribution Function (CDF)

B.5 Expectation

Definition

Properties

Conditional Expectation

B.6 Variance and Covariance

Variance

Covariance

Correlation

Variance of a Sum

B.7 Common Distributions

Discrete Distributions

Continuous Distributions

The Normal Distribution

B.8 Sampling and the Central Limit Theorem

Random Sample

Sample Mean

Law of Large Numbers (LLN)

Central Limit Theorem (CLT)

B.9 Joint Distributions

Joint PMF/PDF

Marginal Distributions

Conditional Distributions

Independence

B.10 Transformations of Random Variables

Univariate

Linear Transformation

B.11 Moment Generating Functions

Definition

Properties

Common MGFs

B.12 Inequalities

Markov's Inequality

Chebyshev's Inequality

Cauchy-Schwarz Inequality

Jensen's Inequality

B.13 Convergence Concepts

Types of Convergence

Relationships

Slutsky's Theorem

Continuous Mapping Theorem

Delta Method

Further Reading