Appendix C: Matrix Algebra Essentials

This appendix reviews matrix algebra concepts used in econometrics. For comprehensive treatment, see Greene (2018) Appendix A or Magnus & Neudecker (2019).


C.1 Vectors and Matrices

Definitions

A vector is an ordered array of numbers:

x=(x1x2xn)\mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix}

This is a column vector of dimension n×1n \times 1. A row vector is x=(x1,x2,,xn)\mathbf{x}' = (x_1, x_2, \ldots, x_n) of dimension 1×n1 \times n.

A matrix is a rectangular array:

A=(a11a12a1na21a22a2nam1am2amn)\mathbf{A} = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix}

This is an m×nm \times n matrix (m rows, n columns). Element aija_{ij} is in row ii, column jj.

Special Matrices

Matrix
Definition

Square matrix

m=nm = n (same number of rows and columns)

Identity matrix In\mathbf{I}_n

Square with 1s on diagonal, 0s elsewhere

Zero matrix 0\mathbf{0}

All elements are zero

Diagonal matrix

Non-zero elements only on main diagonal

Symmetric matrix

A=A\mathbf{A} = \mathbf{A}' (equals its transpose)

Upper/Lower triangular

Zeros below/above the diagonal


C.2 Matrix Operations

Transpose

The transpose A\mathbf{A}' (also written AT\mathbf{A}^T) interchanges rows and columns:

(A)ij=aji(\mathbf{A}')_{ij} = a_{ji}

Properties:

  • (A)=A(\mathbf{A}')' = \mathbf{A}

  • (A+B)=A+B(\mathbf{A} + \mathbf{B})' = \mathbf{A}' + \mathbf{B}'

  • (AB)=BA(\mathbf{AB})' = \mathbf{B}'\mathbf{A}'

  • (cA)=cA(c\mathbf{A})' = c\mathbf{A}'

Addition

Matrices of the same dimension can be added element-wise:

(A+B)ij=aij+bij(\mathbf{A} + \mathbf{B})_{ij} = a_{ij} + b_{ij}

Scalar Multiplication

(cA)ij=caij(c\mathbf{A})_{ij} = c \cdot a_{ij}

Matrix Multiplication

For A\mathbf{A} (m×nm \times n) and B\mathbf{B} (n×pn \times p), the product C=AB\mathbf{C} = \mathbf{AB} is m×pm \times p:

cij=k=1naikbkjc_{ij} = \sum_{k=1}^n a_{ik} b_{kj}

Key rule: Number of columns of A\mathbf{A} must equal number of rows of B\mathbf{B}.

Properties:

  • A(BC)=(AB)C\mathbf{A}(\mathbf{BC}) = (\mathbf{AB})\mathbf{C} (associative)

  • A(B+C)=AB+AC\mathbf{A}(\mathbf{B} + \mathbf{C}) = \mathbf{AB} + \mathbf{AC} (distributive)

  • Generally, ABBA\mathbf{AB} \neq \mathbf{BA} (not commutative!)

  • AI=IA=A\mathbf{AI} = \mathbf{IA} = \mathbf{A}


C.3 Vector Products

Inner (Dot) Product

For vectors x,y\mathbf{x}, \mathbf{y} of dimension n×1n \times 1:

xy=i=1nxiyi\mathbf{x}'\mathbf{y} = \sum_{i=1}^n x_i y_i

This is a scalar.

Outer Product

xy=(x1y1x1y2x2y1x2y2)\mathbf{xy}' = \begin{pmatrix} x_1 y_1 & x_1 y_2 & \cdots \\ x_2 y_1 & x_2 y_2 & \cdots \\ \vdots & \vdots & \ddots \end{pmatrix}

This is an n×nn \times n matrix.

Euclidean Norm

x=xx=i=1nxi2\|\mathbf{x}\| = \sqrt{\mathbf{x}'\mathbf{x}} = \sqrt{\sum_{i=1}^n x_i^2}

C.4 Trace

For a square matrix A\mathbf{A}:

tr(A)=i=1naii\text{tr}(\mathbf{A}) = \sum_{i=1}^n a_{ii}

Properties:

  • tr(A+B)=tr(A)+tr(B)\text{tr}(\mathbf{A} + \mathbf{B}) = \text{tr}(\mathbf{A}) + \text{tr}(\mathbf{B})

  • tr(cA)=ctr(A)\text{tr}(c\mathbf{A}) = c \cdot \text{tr}(\mathbf{A})

  • tr(A)=tr(A)\text{tr}(\mathbf{A}') = \text{tr}(\mathbf{A})

  • tr(AB)=tr(BA)\text{tr}(\mathbf{AB}) = \text{tr}(\mathbf{BA}) (cyclic property)


C.5 Determinant

For a 2×22 \times 2 matrix:

det(abcd)=adbc\det\begin{pmatrix} a & b \\ c & d \end{pmatrix} = ad - bc

Properties:

  • det(A)=det(A)\det(\mathbf{A}') = \det(\mathbf{A})

  • det(AB)=det(A)det(B)\det(\mathbf{AB}) = \det(\mathbf{A}) \det(\mathbf{B})

  • det(cA)=cndet(A)\det(c\mathbf{A}) = c^n \det(\mathbf{A}) for n×nn \times n matrix

  • A\mathbf{A} is invertible iff det(A)0\det(\mathbf{A}) \neq 0


C.6 Matrix Inverse

For a square matrix A\mathbf{A}, the inverse A1\mathbf{A}^{-1} satisfies:

AA1=A1A=I\mathbf{A}\mathbf{A}^{-1} = \mathbf{A}^{-1}\mathbf{A} = \mathbf{I}

Existence: A1\mathbf{A}^{-1} exists iff det(A)0\det(\mathbf{A}) \neq 0 (matrix is nonsingular).

For 2×22 \times 2:

(abcd)1=1adbc(dbca)\begin{pmatrix} a & b \\ c & d \end{pmatrix}^{-1} = \frac{1}{ad-bc} \begin{pmatrix} d & -b \\ -c & a \end{pmatrix}

Properties:

  • (A1)1=A(\mathbf{A}^{-1})^{-1} = \mathbf{A}

  • (AB)1=B1A1(\mathbf{AB})^{-1} = \mathbf{B}^{-1}\mathbf{A}^{-1}

  • (A)1=(A1)(\mathbf{A}')^{-1} = (\mathbf{A}^{-1})'

  • (cA)1=(1/c)A1(c\mathbf{A})^{-1} = (1/c)\mathbf{A}^{-1}


C.7 Rank

The rank of a matrix is the maximum number of linearly independent rows (or columns).

Properties:

  • rank(A)min(m,n)\text{rank}(\mathbf{A}) \leq \min(m, n)

  • Full rank: rank(A)=min(m,n)\text{rank}(\mathbf{A}) = \min(m, n)

  • rank(A)=rank(A)\text{rank}(\mathbf{A}) = \text{rank}(\mathbf{A}')

  • rank(AB)min(rank(A),rank(B))\text{rank}(\mathbf{AB}) \leq \min(\text{rank}(\mathbf{A}), \text{rank}(\mathbf{B}))

For econometrics: XX\mathbf{X}'\mathbf{X} is invertible iff X\mathbf{X} has full column rank.


C.8 Eigenvalues and Eigenvectors

For square matrix A\mathbf{A}, if:

Av=λv\mathbf{A}\mathbf{v} = \lambda \mathbf{v}

then λ\lambda is an eigenvalue and v\mathbf{v} is the corresponding eigenvector.

Computing Eigenvalues

Solve the characteristic equation:

det(AλI)=0\det(\mathbf{A} - \lambda \mathbf{I}) = 0

Properties

For an n×nn \times n matrix:

  • Has nn eigenvalues (counting multiplicities)

  • det(A)=i=1nλi\det(\mathbf{A}) = \prod_{i=1}^n \lambda_i

  • tr(A)=i=1nλi\text{tr}(\mathbf{A}) = \sum_{i=1}^n \lambda_i

  • Symmetric matrices have real eigenvalues


C.9 Positive Definite Matrices

A symmetric matrix A\mathbf{A} is positive definite if:

xAx>0for all x0\mathbf{x}'\mathbf{A}\mathbf{x} > 0 \quad \text{for all } \mathbf{x} \neq \mathbf{0}

Positive semi-definite: xAx0\mathbf{x}'\mathbf{A}\mathbf{x} \geq 0

Equivalent Conditions

For a symmetric matrix A\mathbf{A}:

  • All eigenvalues are positive (semi-def: non-negative)

  • All leading principal minors are positive

  • There exists B\mathbf{B} such that A=BB\mathbf{A} = \mathbf{B}'\mathbf{B}

In Econometrics

  • Variance-covariance matrices are positive semi-definite

  • XX\mathbf{X}'\mathbf{X} is positive semi-definite (positive definite if full rank)


C.10 Quadratic Forms

A quadratic form is:

q(x)=xAx=i=1nj=1naijxixjq(\mathbf{x}) = \mathbf{x}'\mathbf{A}\mathbf{x} = \sum_{i=1}^n \sum_{j=1}^n a_{ij} x_i x_j

where A\mathbf{A} is symmetric.

In Regression

The sum of squared residuals:

SSR=ee=(yXβ^)(yXβ^)\text{SSR} = \mathbf{e}'\mathbf{e} = (\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}})'(\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}})

C.11 Partitioned Matrices

Block Multiplication

(A11A12A21A22)(B11B12B21B22)=(A11B11+A12B21)\begin{pmatrix} \mathbf{A}_{11} & \mathbf{A}_{12} \\ \mathbf{A}_{21} & \mathbf{A}_{22} \end{pmatrix} \begin{pmatrix} \mathbf{B}_{11} & \mathbf{B}_{12} \\ \mathbf{B}_{21} & \mathbf{B}_{22} \end{pmatrix} = \begin{pmatrix} \mathbf{A}_{11}\mathbf{B}_{11} + \mathbf{A}_{12}\mathbf{B}_{21} & \cdots \\ \cdots & \cdots \end{pmatrix}

Partitioned Inverse

For partitioned matrix with conformable blocks:

(ABCD)1\begin{pmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{pmatrix}^{-1}

uses the Schur complement: DCA1B\mathbf{D} - \mathbf{C}\mathbf{A}^{-1}\mathbf{B}


C.12 Matrix Calculus

Gradient

For scalar function f(x)f(\mathbf{x}) where x\mathbf{x} is n×1n \times 1:

f=fx=(f/x1f/x2f/xn)\nabla f = \frac{\partial f}{\partial \mathbf{x}} = \begin{pmatrix} \partial f / \partial x_1 \\ \partial f / \partial x_2 \\ \vdots \\ \partial f / \partial x_n \end{pmatrix}

Common Derivatives

Function

Derivative w.r.t. x\mathbf{x}

ax\mathbf{a}'\mathbf{x}

a\mathbf{a}

xa\mathbf{x}'\mathbf{a}

a\mathbf{a}

xAx\mathbf{x}'\mathbf{A}\mathbf{x}

(A+A)x(\mathbf{A} + \mathbf{A}')\mathbf{x}

xAx\mathbf{x}'\mathbf{A}\mathbf{x} (A symmetric)

2Ax2\mathbf{A}\mathbf{x}

Hessian

For scalar function f(x)f(\mathbf{x}):

H=2fxx=(2fx122fx1x22fx2x12fx22)\mathbf{H} = \frac{\partial^2 f}{\partial \mathbf{x} \partial \mathbf{x}'} = \begin{pmatrix} \frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \cdots \\ \frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2} & \cdots \\ \vdots & \vdots & \ddots \end{pmatrix}

C.13 Application: OLS in Matrix Form

The Model

y=Xβ+ε\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}

where:

  • y\mathbf{y}: n×1n \times 1 outcome vector

  • X\mathbf{X}: n×kn \times k regressor matrix (includes constant)

  • β\boldsymbol{\beta}: k×1k \times 1 coefficient vector

  • ε\boldsymbol{\varepsilon}: n×1n \times 1 error vector

OLS Estimator

Minimizing SSR=(yXβ)(yXβ)\text{SSR} = (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})'(\mathbf{y} - \mathbf{X}\boldsymbol{\beta}):

β^=(XX)1Xy\hat{\boldsymbol{\beta}} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}

Fitted Values and Residuals

y^=Xβ^=X(XX)1Xy=Py\hat{\mathbf{y}} = \mathbf{X}\hat{\boldsymbol{\beta}} = \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y} = \mathbf{P}\mathbf{y}

where P=X(XX)1X\mathbf{P} = \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}' is the projection matrix (or "hat matrix").

ε^=yy^=(IP)y=My\hat{\boldsymbol{\varepsilon}} = \mathbf{y} - \hat{\mathbf{y}} = (\mathbf{I} - \mathbf{P})\mathbf{y} = \mathbf{M}\mathbf{y}

where M=IP\mathbf{M} = \mathbf{I} - \mathbf{P} is the residual maker.

Properties of P and M

  • P\mathbf{P} and M\mathbf{M} are symmetric and idempotent (P2=P\mathbf{P}^2 = \mathbf{P})

  • PM=0\mathbf{PM} = \mathbf{0}

  • tr(P)=k\text{tr}(\mathbf{P}) = k (number of regressors)

  • tr(M)=nk\text{tr}(\mathbf{M}) = n - k (degrees of freedom)

Variance of OLS Estimator

Under homoskedasticity (Var(εX)=σ2I\text{Var}(\boldsymbol{\varepsilon}|\mathbf{X}) = \sigma^2\mathbf{I}):

Var(β^X)=σ2(XX)1\text{Var}(\hat{\boldsymbol{\beta}}|\mathbf{X}) = \sigma^2(\mathbf{X}'\mathbf{X})^{-1}

Under heteroskedasticity (Var(εX)=Ω\text{Var}(\boldsymbol{\varepsilon}|\mathbf{X}) = \boldsymbol{\Omega}):

Var(β^X)=(XX)1XΩX(XX)1\text{Var}(\hat{\boldsymbol{\beta}}|\mathbf{X}) = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\Omega}\mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}

This is the "sandwich" form used in robust standard errors.


C.14 The Frisch-Waugh-Lovell Theorem

To obtain the coefficient on X1\mathbf{X}_1 in:

y=X1β1+X2β2+ε\mathbf{y} = \mathbf{X}_1\boldsymbol{\beta}_1 + \mathbf{X}_2\boldsymbol{\beta}_2 + \boldsymbol{\varepsilon}

equivalently:

  1. Regress y\mathbf{y} on X2\mathbf{X}_2, save residuals y~\tilde{\mathbf{y}}

  2. Regress X1\mathbf{X}_1 on X2\mathbf{X}_2, save residuals X~1\tilde{\mathbf{X}}_1

  3. Regress y~\tilde{\mathbf{y}} on X~1\tilde{\mathbf{X}}_1

This gives the same β^1\hat{\boldsymbol{\beta}}_1 as the full regression.

Intuition: Coefficient on X1\mathbf{X}_1 uses only variation in X1\mathbf{X}_1 not explained by X2\mathbf{X}_2.


Further Reading

  • Greene, W. H. (2018). Econometric Analysis (8th ed.). Pearson. Appendix A.

  • Magnus, J. R., & Neudecker, H. (2019). Matrix Differential Calculus (3rd ed.). Wiley.

  • Abadir, K. M., & Magnus, J. R. (2005). Matrix Algebra. Cambridge University Press.

Last updated