Appendix C: Matrix Algebra Essentials

This appendix reviews matrix algebra concepts used in econometrics. For comprehensive treatment, see Greene (2018) Appendix A or Magnus & Neudecker (2019).

C.1 Vectors and Matrices

Definitions

A vector is an ordered array of numbers:

\mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix}

This is a column vector of dimension $n \times 1$ . A row vector is $\mathbf{x}' = (x_1, x_2, \ldots, x_n)$ of dimension $1 \times n$ .

A matrix is a rectangular array:

\mathbf{A} = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix}

This is an $m \times n$ matrix (m rows, n columns). Element $a_{ij}$ is in row $i$ , column $j$ .

Special Matrices

Matrix

Definition

Square matrix

$m = n$ (same number of rows and columns)

Identity matrix $\mathbf{I}_n$

Square with 1s on diagonal, 0s elsewhere

Zero matrix $\mathbf{0}$

All elements are zero

Diagonal matrix

Non-zero elements only on main diagonal

Symmetric matrix

$\mathbf{A} = \mathbf{A}'$ (equals its transpose)

Upper/Lower triangular

Zeros below/above the diagonal

C.2 Matrix Operations

Transpose

The transpose $\mathbf{A}'$ (also written $\mathbf{A}^T$ ) interchanges rows and columns:

(\mathbf{A}')_{ij} = a_{ji}

Properties:

$(\mathbf{A}')' = \mathbf{A}$
$(\mathbf{A} + \mathbf{B})' = \mathbf{A}' + \mathbf{B}'$
$(\mathbf{AB})' = \mathbf{B}'\mathbf{A}'$
$(c\mathbf{A})' = c\mathbf{A}'$

Addition

Matrices of the same dimension can be added element-wise:

(\mathbf{A} + \mathbf{B})_{ij} = a_{ij} + b_{ij}

Scalar Multiplication

(c\mathbf{A})_{ij} = c \cdot a_{ij}

Matrix Multiplication

For $\mathbf{A}$ ( $m \times n$ ) and $\mathbf{B}$ ( $n \times p$ ), the product $\mathbf{C} = \mathbf{AB}$ is $m \times p$ :

c_{ij} = \sum_{k=1}^n a_{ik} b_{kj}

Key rule: Number of columns of $\mathbf{A}$ must equal number of rows of $\mathbf{B}$ .

Properties:

$\mathbf{A}(\mathbf{BC}) = (\mathbf{AB})\mathbf{C}$ (associative)
$\mathbf{A}(\mathbf{B} + \mathbf{C}) = \mathbf{AB} + \mathbf{AC}$ (distributive)
Generally, $\mathbf{AB} \neq \mathbf{BA}$ (not commutative!)
$\mathbf{AI} = \mathbf{IA} = \mathbf{A}$

C.3 Vector Products

Inner (Dot) Product

For vectors $\mathbf{x}, \mathbf{y}$ of dimension $n \times 1$ :

\mathbf{x}'\mathbf{y} = \sum_{i=1}^n x_i y_i

This is a scalar.

Outer Product

\mathbf{xy}' = \begin{pmatrix} x_1 y_1 & x_1 y_2 & \cdots \\ x_2 y_1 & x_2 y_2 & \cdots \\ \vdots & \vdots & \ddots \end{pmatrix}

This is an $n \times n$ matrix.

Euclidean Norm

\|\mathbf{x}\| = \sqrt{\mathbf{x}'\mathbf{x}} = \sqrt{\sum_{i=1}^n x_i^2}

C.4 Trace

For a square matrix $\mathbf{A}$ :

\text{tr}(\mathbf{A}) = \sum_{i=1}^n a_{ii}

Properties:

$\text{tr}(\mathbf{A} + \mathbf{B}) = \text{tr}(\mathbf{A}) + \text{tr}(\mathbf{B})$
$\text{tr}(c\mathbf{A}) = c \cdot \text{tr}(\mathbf{A})$
$\text{tr}(\mathbf{A}') = \text{tr}(\mathbf{A})$
$\text{tr}(\mathbf{AB}) = \text{tr}(\mathbf{BA})$ (cyclic property)

C.5 Determinant

For a $2 \times 2$ matrix:

\det\begin{pmatrix} a & b \\ c & d \end{pmatrix} = ad - bc

Properties:

$\det(\mathbf{A}') = \det(\mathbf{A})$
$\det(\mathbf{AB}) = \det(\mathbf{A}) \det(\mathbf{B})$
$\det(c\mathbf{A}) = c^n \det(\mathbf{A})$ for $n \times n$ matrix
$\mathbf{A}$ is invertible iff $\det(\mathbf{A}) \neq 0$

C.6 Matrix Inverse

For a square matrix $\mathbf{A}$ , the inverse $\mathbf{A}^{-1}$ satisfies:

\mathbf{A}\mathbf{A}^{-1} = \mathbf{A}^{-1}\mathbf{A} = \mathbf{I}

Existence: $\mathbf{A}^{-1}$ exists iff $\det(\mathbf{A}) \neq 0$ (matrix is nonsingular).

For $2 \times 2$ :

\begin{pmatrix} a & b \\ c & d \end{pmatrix}^{-1} = \frac{1}{ad-bc} \begin{pmatrix} d & -b \\ -c & a \end{pmatrix}

Properties:

$(\mathbf{A}^{-1})^{-1} = \mathbf{A}$
$(\mathbf{AB})^{-1} = \mathbf{B}^{-1}\mathbf{A}^{-1}$
$(\mathbf{A}')^{-1} = (\mathbf{A}^{-1})'$
$(c\mathbf{A})^{-1} = (1/c)\mathbf{A}^{-1}$

C.7 Rank

The rank of a matrix is the maximum number of linearly independent rows (or columns).

Properties:

$\text{rank}(\mathbf{A}) \leq \min(m, n)$
Full rank: $\text{rank}(\mathbf{A}) = \min(m, n)$
$\text{rank}(\mathbf{A}) = \text{rank}(\mathbf{A}')$
$\text{rank}(\mathbf{AB}) \leq \min(\text{rank}(\mathbf{A}), \text{rank}(\mathbf{B}))$

For econometrics: $\mathbf{X}'\mathbf{X}$ is invertible iff $\mathbf{X}$ has full column rank.

C.8 Eigenvalues and Eigenvectors

For square matrix $\mathbf{A}$ , if:

\mathbf{A}\mathbf{v} = \lambda \mathbf{v}

then $\lambda$ is an eigenvalue and $\mathbf{v}$ is the corresponding eigenvector.

Computing Eigenvalues

Solve the characteristic equation:

\det(\mathbf{A} - \lambda \mathbf{I}) = 0

Properties

For an $n \times n$ matrix:

Has $n$ eigenvalues (counting multiplicities)
$\det(\mathbf{A}) = \prod_{i=1}^n \lambda_i$
$\text{tr}(\mathbf{A}) = \sum_{i=1}^n \lambda_i$
Symmetric matrices have real eigenvalues

C.9 Positive Definite Matrices

A symmetric matrix $\mathbf{A}$ is positive definite if:

\mathbf{x}'\mathbf{A}\mathbf{x} > 0 \quad \text{for all } \mathbf{x} \neq \mathbf{0}

Positive semi-definite: $\mathbf{x}'\mathbf{A}\mathbf{x} \geq 0$

Equivalent Conditions

For a symmetric matrix $\mathbf{A}$ :

All eigenvalues are positive (semi-def: non-negative)
All leading principal minors are positive
There exists $\mathbf{B}$ such that $\mathbf{A} = \mathbf{B}'\mathbf{B}$

In Econometrics

Variance-covariance matrices are positive semi-definite
$\mathbf{X}'\mathbf{X}$ is positive semi-definite (positive definite if full rank)

C.10 Quadratic Forms

A quadratic form is:

q(\mathbf{x}) = \mathbf{x}'\mathbf{A}\mathbf{x} = \sum_{i=1}^n \sum_{j=1}^n a_{ij} x_i x_j

where $\mathbf{A}$ is symmetric.

In Regression

The sum of squared residuals:

\text{SSR} = \mathbf{e}'\mathbf{e} = (\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}})'(\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}})

C.11 Partitioned Matrices

Block Multiplication

\begin{pmatrix} \mathbf{A}_{11} & \mathbf{A}_{12} \\ \mathbf{A}_{21} & \mathbf{A}_{22} \end{pmatrix} \begin{pmatrix} \mathbf{B}_{11} & \mathbf{B}_{12} \\ \mathbf{B}_{21} & \mathbf{B}_{22} \end{pmatrix} = \begin{pmatrix} \mathbf{A}_{11}\mathbf{B}_{11} + \mathbf{A}_{12}\mathbf{B}_{21} & \cdots \\ \cdots & \cdots \end{pmatrix}

Partitioned Inverse

For partitioned matrix with conformable blocks:

\begin{pmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{pmatrix}^{-1}

uses the Schur complement: $\mathbf{D} - \mathbf{C}\mathbf{A}^{-1}\mathbf{B}$

C.12 Matrix Calculus

Gradient

For scalar function $f(\mathbf{x})$ where $\mathbf{x}$ is $n \times 1$ :

\nabla f = \frac{\partial f}{\partial \mathbf{x}} = \begin{pmatrix} \partial f / \partial x_1 \\ \partial f / \partial x_2 \\ \vdots \\ \partial f / \partial x_n \end{pmatrix}

Common Derivatives

Function

Derivative w.r.t. $\mathbf{x}$

$\mathbf{a}'\mathbf{x}$

$\mathbf{a}$

$\mathbf{x}'\mathbf{a}$

$\mathbf{a}$

$\mathbf{x}'\mathbf{A}\mathbf{x}$

$(\mathbf{A} + \mathbf{A}')\mathbf{x}$

$\mathbf{x}'\mathbf{A}\mathbf{x}$ (A symmetric)

$2\mathbf{A}\mathbf{x}$

Hessian

For scalar function $f(\mathbf{x})$ :

\mathbf{H} = \frac{\partial^2 f}{\partial \mathbf{x} \partial \mathbf{x}'} = \begin{pmatrix} \frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \cdots \\ \frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2} & \cdots \\ \vdots & \vdots & \ddots \end{pmatrix}

C.13 Application: OLS in Matrix Form

The Model

\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}

where:

$\mathbf{y}$ : $n \times 1$ outcome vector
$\mathbf{X}$ : $n \times k$ regressor matrix (includes constant)
$\boldsymbol{\beta}$ : $k \times 1$ coefficient vector
$\boldsymbol{\varepsilon}$ : $n \times 1$ error vector

OLS Estimator

Minimizing $\text{SSR} = (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})'(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})$ :

\hat{\boldsymbol{\beta}} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}

Fitted Values and Residuals

\hat{\mathbf{y}} = \mathbf{X}\hat{\boldsymbol{\beta}} = \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y} = \mathbf{P}\mathbf{y}

where $\mathbf{P} = \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'$ is the projection matrix (or "hat matrix").

\hat{\boldsymbol{\varepsilon}} = \mathbf{y} - \hat{\mathbf{y}} = (\mathbf{I} - \mathbf{P})\mathbf{y} = \mathbf{M}\mathbf{y}

where $\mathbf{M} = \mathbf{I} - \mathbf{P}$ is the residual maker.

Properties of P and M

$\mathbf{P}$ and $\mathbf{M}$ are symmetric and idempotent ( $\mathbf{P}^2 = \mathbf{P}$ )
$\mathbf{PM} = \mathbf{0}$
$\text{tr}(\mathbf{P}) = k$ (number of regressors)
$\text{tr}(\mathbf{M}) = n - k$ (degrees of freedom)

Variance of OLS Estimator

Under homoskedasticity ( $\text{Var}(\boldsymbol{\varepsilon}|\mathbf{X}) = \sigma^2\mathbf{I}$ ):

\text{Var}(\hat{\boldsymbol{\beta}}|\mathbf{X}) = \sigma^2(\mathbf{X}'\mathbf{X})^{-1}

Under heteroskedasticity ( $\text{Var}(\boldsymbol{\varepsilon}|\mathbf{X}) = \boldsymbol{\Omega}$ ):

\text{Var}(\hat{\boldsymbol{\beta}}|\mathbf{X}) = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\Omega}\mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}

This is the "sandwich" form used in robust standard errors.

C.14 The Frisch-Waugh-Lovell Theorem

To obtain the coefficient on $\mathbf{X}_1$ in:

\mathbf{y} = \mathbf{X}_1\boldsymbol{\beta}_1 + \mathbf{X}_2\boldsymbol{\beta}_2 + \boldsymbol{\varepsilon}

equivalently:

Regress $\mathbf{y}$ on $\mathbf{X}_2$ , save residuals $\tilde{\mathbf{y}}$
Regress $\mathbf{X}_1$ on $\mathbf{X}_2$ , save residuals $\tilde{\mathbf{X}}_1$
Regress $\tilde{\mathbf{y}}$ on $\tilde{\mathbf{X}}_1$

This gives the same $\hat{\boldsymbol{\beta}}_1$ as the full regression.

Intuition: Coefficient on $\mathbf{X}_1$ uses only variation in $\mathbf{X}_1$ not explained by $\mathbf{X}_2$ .

hashtagC.1 Vectors and Matrices

hashtagDefinitions

hashtagSpecial Matrices

hashtagC.2 Matrix Operations

hashtagTranspose

hashtagAddition

hashtagScalar Multiplication

hashtagMatrix Multiplication

hashtagC.3 Vector Products

hashtagInner (Dot) Product

hashtagOuter Product

hashtagEuclidean Norm

hashtagC.4 Trace

hashtagC.5 Determinant

hashtagC.6 Matrix Inverse

hashtagC.7 Rank

hashtagC.8 Eigenvalues and Eigenvectors

hashtagComputing Eigenvalues

hashtagProperties

hashtagC.9 Positive Definite Matrices

hashtagEquivalent Conditions

hashtagIn Econometrics

hashtagC.10 Quadratic Forms

hashtagIn Regression

hashtagC.11 Partitioned Matrices

hashtagBlock Multiplication

hashtagPartitioned Inverse

hashtagC.12 Matrix Calculus

hashtagGradient

hashtagCommon Derivatives

hashtagHessian

hashtagC.13 Application: OLS in Matrix Form

hashtagThe Model

hashtagOLS Estimator

hashtagFitted Values and Residuals

hashtagProperties of P and M

hashtagVariance of OLS Estimator

hashtagC.14 The Frisch-Waugh-Lovell Theorem

hashtagFurther Reading

C.1 Vectors and Matrices

Definitions

Special Matrices

C.2 Matrix Operations

Transpose

Addition

Scalar Multiplication

Matrix Multiplication

C.3 Vector Products

Inner (Dot) Product

Outer Product

Euclidean Norm

C.4 Trace

C.5 Determinant

C.6 Matrix Inverse

C.7 Rank

C.8 Eigenvalues and Eigenvectors

Computing Eigenvalues

Properties

C.9 Positive Definite Matrices

Equivalent Conditions

In Econometrics

C.10 Quadratic Forms

In Regression

C.11 Partitioned Matrices

Block Multiplication

Partitioned Inverse

C.12 Matrix Calculus

Gradient

Common Derivatives

Hessian

C.13 Application: OLS in Matrix Form

The Model

OLS Estimator

Fitted Values and Residuals

Properties of P and M

Variance of OLS Estimator

C.14 The Frisch-Waugh-Lovell Theorem

Further Reading