This appendix reviews matrix algebra concepts used in econometrics. For comprehensive treatment, see Greene (2018) Appendix A or Magnus & Neudecker (2019).
C.1 Vectors and Matrices
A vector is an ordered array of numbers:
x = ( x 1 x 2 ⋮ x n ) \mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix} x = x 1 x 2 ⋮ x n This is a column vector of dimension n × 1 n \times 1 n × 1 . A row vector is x ′ = ( x 1 , x 2 , … , x n ) \mathbf{x}' = (x_1, x_2, \ldots, x_n) x ′ = ( x 1 , x 2 , … , x n ) of dimension 1 × n 1 \times n 1 × n .
A matrix is a rectangular array:
A = ( a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋱ ⋮ a m 1 a m 2 ⋯ a m n ) \mathbf{A} = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix} A = a 11 a 21 ⋮ a m 1 a 12 a 22 ⋮ a m 2 ⋯ ⋯ ⋱ ⋯ a 1 n a 2 n ⋮ a mn This is an m × n m \times n m × n matrix (m rows, n columns). Element a i j a_{ij} a ij is in row i i i , column j j j .
Special Matrices
m = n m = n m = n (same number of rows and columns)
Identity matrix I n \mathbf{I}_n I n
Square with 1s on diagonal, 0s elsewhere
Non-zero elements only on main diagonal
A = A ′ \mathbf{A} = \mathbf{A}' A = A ′ (equals its transpose)
Zeros below/above the diagonal
C.2 Matrix Operations
The transpose A ′ \mathbf{A}' A ′ (also written A T \mathbf{A}^T A T ) interchanges rows and columns:
( A ′ ) i j = a j i (\mathbf{A}')_{ij} = a_{ji} ( A ′ ) ij = a ji Properties :
( A ′ ) ′ = A (\mathbf{A}')' = \mathbf{A} ( A ′ ) ′ = A
( A + B ) ′ = A ′ + B ′ (\mathbf{A} + \mathbf{B})' = \mathbf{A}' + \mathbf{B}' ( A + B ) ′ = A ′ + B ′
( A B ) ′ = B ′ A ′ (\mathbf{AB})' = \mathbf{B}'\mathbf{A}' ( AB ) ′ = B ′ A ′
( c A ) ′ = c A ′ (c\mathbf{A})' = c\mathbf{A}' ( c A ) ′ = c A ′
Matrices of the same dimension can be added element-wise:
( A + B ) i j = a i j + b i j (\mathbf{A} + \mathbf{B})_{ij} = a_{ij} + b_{ij} ( A + B ) ij = a ij + b ij Scalar Multiplication
( c A ) i j = c ⋅ a i j (c\mathbf{A})_{ij} = c \cdot a_{ij} ( c A ) ij = c ⋅ a ij Matrix Multiplication
For A \mathbf{A} A (m × n m \times n m × n ) and B \mathbf{B} B (n × p n \times p n × p ), the product C = A B \mathbf{C} = \mathbf{AB} C = AB is m × p m \times p m × p :
c i j = ∑ k = 1 n a i k b k j c_{ij} = \sum_{k=1}^n a_{ik} b_{kj} c ij = k = 1 ∑ n a ik b kj Key rule : Number of columns of A \mathbf{A} A must equal number of rows of B \mathbf{B} B .
Properties :
A ( B C ) = ( A B ) C \mathbf{A}(\mathbf{BC}) = (\mathbf{AB})\mathbf{C} A ( BC ) = ( AB ) C (associative)
A ( B + C ) = A B + A C \mathbf{A}(\mathbf{B} + \mathbf{C}) = \mathbf{AB} + \mathbf{AC} A ( B + C ) = AB + AC (distributive)
Generally, A B ≠ B A \mathbf{AB} \neq \mathbf{BA} AB = BA (not commutative!)
A I = I A = A \mathbf{AI} = \mathbf{IA} = \mathbf{A} AI = IA = A
C.3 Vector Products
Inner (Dot) Product
For vectors x , y \mathbf{x}, \mathbf{y} x , y of dimension n × 1 n \times 1 n × 1 :
x ′ y = ∑ i = 1 n x i y i \mathbf{x}'\mathbf{y} = \sum_{i=1}^n x_i y_i x ′ y = i = 1 ∑ n x i y i This is a scalar.
x y ′ = ( x 1 y 1 x 1 y 2 ⋯ x 2 y 1 x 2 y 2 ⋯ ⋮ ⋮ ⋱ ) \mathbf{xy}' = \begin{pmatrix} x_1 y_1 & x_1 y_2 & \cdots \\ x_2 y_1 & x_2 y_2 & \cdots \\ \vdots & \vdots & \ddots \end{pmatrix} xy ′ = x 1 y 1 x 2 y 1 ⋮ x 1 y 2 x 2 y 2 ⋮ ⋯ ⋯ ⋱ This is an n × n n \times n n × n matrix.
∥ x ∥ = x ′ x = ∑ i = 1 n x i 2 \|\mathbf{x}\| = \sqrt{\mathbf{x}'\mathbf{x}} = \sqrt{\sum_{i=1}^n x_i^2} ∥ x ∥ = x ′ x = i = 1 ∑ n x i 2 For a square matrix A \mathbf{A} A :
tr ( A ) = ∑ i = 1 n a i i \text{tr}(\mathbf{A}) = \sum_{i=1}^n a_{ii} tr ( A ) = i = 1 ∑ n a ii Properties :
tr ( A + B ) = tr ( A ) + tr ( B ) \text{tr}(\mathbf{A} + \mathbf{B}) = \text{tr}(\mathbf{A}) + \text{tr}(\mathbf{B}) tr ( A + B ) = tr ( A ) + tr ( B )
tr ( c A ) = c ⋅ tr ( A ) \text{tr}(c\mathbf{A}) = c \cdot \text{tr}(\mathbf{A}) tr ( c A ) = c ⋅ tr ( A )
tr ( A ′ ) = tr ( A ) \text{tr}(\mathbf{A}') = \text{tr}(\mathbf{A}) tr ( A ′ ) = tr ( A )
tr ( A B ) = tr ( B A ) \text{tr}(\mathbf{AB}) = \text{tr}(\mathbf{BA}) tr ( AB ) = tr ( BA ) (cyclic property)
C.5 Determinant
For a 2 × 2 2 \times 2 2 × 2 matrix:
det ( a b c d ) = a d − b c \det\begin{pmatrix} a & b \\ c & d \end{pmatrix} = ad - bc det ( a c b d ) = a d − b c Properties :
det ( A ′ ) = det ( A ) \det(\mathbf{A}') = \det(\mathbf{A}) det ( A ′ ) = det ( A )
det ( A B ) = det ( A ) det ( B ) \det(\mathbf{AB}) = \det(\mathbf{A}) \det(\mathbf{B}) det ( AB ) = det ( A ) det ( B )
det ( c A ) = c n det ( A ) \det(c\mathbf{A}) = c^n \det(\mathbf{A}) det ( c A ) = c n det ( A ) for n × n n \times n n × n matrix
A \mathbf{A} A is invertible iff det ( A ) ≠ 0 \det(\mathbf{A}) \neq 0 det ( A ) = 0
C.6 Matrix Inverse
For a square matrix A \mathbf{A} A , the inverse A − 1 \mathbf{A}^{-1} A − 1 satisfies:
A A − 1 = A − 1 A = I \mathbf{A}\mathbf{A}^{-1} = \mathbf{A}^{-1}\mathbf{A} = \mathbf{I} A A − 1 = A − 1 A = I Existence : A − 1 \mathbf{A}^{-1} A − 1 exists iff det ( A ) ≠ 0 \det(\mathbf{A}) \neq 0 det ( A ) = 0 (matrix is nonsingular ).
For 2 × 2 2 \times 2 2 × 2 :
( a b c d ) − 1 = 1 a d − b c ( d − b − c a ) \begin{pmatrix} a & b \\ c & d \end{pmatrix}^{-1} = \frac{1}{ad-bc} \begin{pmatrix} d & -b \\ -c & a \end{pmatrix} ( a c b d ) − 1 = a d − b c 1 ( d − c − b a ) Properties :
( A − 1 ) − 1 = A (\mathbf{A}^{-1})^{-1} = \mathbf{A} ( A − 1 ) − 1 = A
( A B ) − 1 = B − 1 A − 1 (\mathbf{AB})^{-1} = \mathbf{B}^{-1}\mathbf{A}^{-1} ( AB ) − 1 = B − 1 A − 1
( A ′ ) − 1 = ( A − 1 ) ′ (\mathbf{A}')^{-1} = (\mathbf{A}^{-1})' ( A ′ ) − 1 = ( A − 1 ) ′
( c A ) − 1 = ( 1 / c ) A − 1 (c\mathbf{A})^{-1} = (1/c)\mathbf{A}^{-1} ( c A ) − 1 = ( 1/ c ) A − 1
The rank of a matrix is the maximum number of linearly independent rows (or columns).
Properties :
rank ( A ) ≤ min ( m , n ) \text{rank}(\mathbf{A}) \leq \min(m, n) rank ( A ) ≤ min ( m , n )
Full rank : rank ( A ) = min ( m , n ) \text{rank}(\mathbf{A}) = \min(m, n) rank ( A ) = min ( m , n )
rank ( A ) = rank ( A ′ ) \text{rank}(\mathbf{A}) = \text{rank}(\mathbf{A}') rank ( A ) = rank ( A ′ )
rank ( A B ) ≤ min ( rank ( A ) , rank ( B ) ) \text{rank}(\mathbf{AB}) \leq \min(\text{rank}(\mathbf{A}), \text{rank}(\mathbf{B})) rank ( AB ) ≤ min ( rank ( A ) , rank ( B ))
For econometrics : X ′ X \mathbf{X}'\mathbf{X} X ′ X is invertible iff X \mathbf{X} X has full column rank.
C.8 Eigenvalues and Eigenvectors
For square matrix A \mathbf{A} A , if:
A v = λ v \mathbf{A}\mathbf{v} = \lambda \mathbf{v} Av = λ v then λ \lambda λ is an eigenvalue and v \mathbf{v} v is the corresponding eigenvector .
Computing Eigenvalues
Solve the characteristic equation:
det ( A − λ I ) = 0 \det(\mathbf{A} - \lambda \mathbf{I}) = 0 det ( A − λ I ) = 0 For an n × n n \times n n × n matrix:
Has n n n eigenvalues (counting multiplicities)
det ( A ) = ∏ i = 1 n λ i \det(\mathbf{A}) = \prod_{i=1}^n \lambda_i det ( A ) = ∏ i = 1 n λ i
tr ( A ) = ∑ i = 1 n λ i \text{tr}(\mathbf{A}) = \sum_{i=1}^n \lambda_i tr ( A ) = ∑ i = 1 n λ i
Symmetric matrices have real eigenvalues
C.9 Positive Definite Matrices
A symmetric matrix A \mathbf{A} A is positive definite if:
x ′ A x > 0 for all x ≠ 0 \mathbf{x}'\mathbf{A}\mathbf{x} > 0 \quad \text{for all } \mathbf{x} \neq \mathbf{0} x ′ Ax > 0 for all x = 0 Positive semi-definite : x ′ A x ≥ 0 \mathbf{x}'\mathbf{A}\mathbf{x} \geq 0 x ′ Ax ≥ 0
Equivalent Conditions
For a symmetric matrix A \mathbf{A} A :
All eigenvalues are positive (semi-def: non-negative)
All leading principal minors are positive
There exists B \mathbf{B} B such that A = B ′ B \mathbf{A} = \mathbf{B}'\mathbf{B} A = B ′ B
In Econometrics
Variance-covariance matrices are positive semi-definite
X ′ X \mathbf{X}'\mathbf{X} X ′ X is positive semi-definite (positive definite if full rank)
A quadratic form is:
q ( x ) = x ′ A x = ∑ i = 1 n ∑ j = 1 n a i j x i x j q(\mathbf{x}) = \mathbf{x}'\mathbf{A}\mathbf{x} = \sum_{i=1}^n \sum_{j=1}^n a_{ij} x_i x_j q ( x ) = x ′ Ax = i = 1 ∑ n j = 1 ∑ n a ij x i x j where A \mathbf{A} A is symmetric.
The sum of squared residuals:
SSR = e ′ e = ( y − X β ^ ) ′ ( y − X β ^ ) \text{SSR} = \mathbf{e}'\mathbf{e} = (\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}})'(\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}}) SSR = e ′ e = ( y − X β ^ ) ′ ( y − X β ^ ) C.11 Partitioned Matrices
Block Multiplication
( A 11 A 12 A 21 A 22 ) ( B 11 B 12 B 21 B 22 ) = ( A 11 B 11 + A 12 B 21 ⋯ ⋯ ⋯ ) \begin{pmatrix} \mathbf{A}_{11} & \mathbf{A}_{12} \\ \mathbf{A}_{21} & \mathbf{A}_{22} \end{pmatrix} \begin{pmatrix} \mathbf{B}_{11} & \mathbf{B}_{12} \\ \mathbf{B}_{21} & \mathbf{B}_{22} \end{pmatrix} = \begin{pmatrix} \mathbf{A}_{11}\mathbf{B}_{11} + \mathbf{A}_{12}\mathbf{B}_{21} & \cdots \\ \cdots & \cdots \end{pmatrix} ( A 11 A 21 A 12 A 22 ) ( B 11 B 21 B 12 B 22 ) = ( A 11 B 11 + A 12 B 21 ⋯ ⋯ ⋯ ) Partitioned Inverse
For partitioned matrix with conformable blocks:
( A B C D ) − 1 \begin{pmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{pmatrix}^{-1} ( A C B D ) − 1 uses the Schur complement: D − C A − 1 B \mathbf{D} - \mathbf{C}\mathbf{A}^{-1}\mathbf{B} D − C A − 1 B
C.12 Matrix Calculus
For scalar function f ( x ) f(\mathbf{x}) f ( x ) where x \mathbf{x} x is n × 1 n \times 1 n × 1 :
∇ f = ∂ f ∂ x = ( ∂ f / ∂ x 1 ∂ f / ∂ x 2 ⋮ ∂ f / ∂ x n ) \nabla f = \frac{\partial f}{\partial \mathbf{x}} = \begin{pmatrix} \partial f / \partial x_1 \\ \partial f / \partial x_2 \\ \vdots \\ \partial f / \partial x_n \end{pmatrix} ∇ f = ∂ x ∂ f = ∂ f / ∂ x 1 ∂ f / ∂ x 2 ⋮ ∂ f / ∂ x n Common Derivatives
Derivative w.r.t. x \mathbf{x} x
a ′ x \mathbf{a}'\mathbf{x} a ′ x
x ′ a \mathbf{x}'\mathbf{a} x ′ a
x ′ A x \mathbf{x}'\mathbf{A}\mathbf{x} x ′ Ax
( A + A ′ ) x (\mathbf{A} + \mathbf{A}')\mathbf{x} ( A + A ′ ) x
x ′ A x \mathbf{x}'\mathbf{A}\mathbf{x} x ′ Ax (A symmetric)
2 A x 2\mathbf{A}\mathbf{x} 2 Ax
For scalar function f ( x ) f(\mathbf{x}) f ( x ) :
H = ∂ 2 f ∂ x ∂ x ′ = ( ∂ 2 f ∂ x 1 2 ∂ 2 f ∂ x 1 ∂ x 2 ⋯ ∂ 2 f ∂ x 2 ∂ x 1 ∂ 2 f ∂ x 2 2 ⋯ ⋮ ⋮ ⋱ ) \mathbf{H} = \frac{\partial^2 f}{\partial \mathbf{x} \partial \mathbf{x}'} = \begin{pmatrix} \frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \cdots \\ \frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2} & \cdots \\ \vdots & \vdots & \ddots \end{pmatrix} H = ∂ x ∂ x ′ ∂ 2 f = ∂ x 1 2 ∂ 2 f ∂ x 2 ∂ x 1 ∂ 2 f ⋮ ∂ x 1 ∂ x 2 ∂ 2 f ∂ x 2 2 ∂ 2 f ⋮ ⋯ ⋯ ⋱ y = X β + ε \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon} y = X β + ε where:
y \mathbf{y} y : n × 1 n \times 1 n × 1 outcome vector
X \mathbf{X} X : n × k n \times k n × k regressor matrix (includes constant)
β \boldsymbol{\beta} β : k × 1 k \times 1 k × 1 coefficient vector
ε \boldsymbol{\varepsilon} ε : n × 1 n \times 1 n × 1 error vector
Minimizing SSR = ( y − X β ) ′ ( y − X β ) \text{SSR} = (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})'(\mathbf{y} - \mathbf{X}\boldsymbol{\beta}) SSR = ( y − X β ) ′ ( y − X β ) :
β ^ = ( X ′ X ) − 1 X ′ y \hat{\boldsymbol{\beta}} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y} β ^ = ( X ′ X ) − 1 X ′ y Fitted Values and Residuals
y ^ = X β ^ = X ( X ′ X ) − 1 X ′ y = P y \hat{\mathbf{y}} = \mathbf{X}\hat{\boldsymbol{\beta}} = \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y} = \mathbf{P}\mathbf{y} y ^ = X β ^ = X ( X ′ X ) − 1 X ′ y = Py where P = X ( X ′ X ) − 1 X ′ \mathbf{P} = \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}' P = X ( X ′ X ) − 1 X ′ is the projection matrix (or "hat matrix").
ε ^ = y − y ^ = ( I − P ) y = M y \hat{\boldsymbol{\varepsilon}} = \mathbf{y} - \hat{\mathbf{y}} = (\mathbf{I} - \mathbf{P})\mathbf{y} = \mathbf{M}\mathbf{y} ε ^ = y − y ^ = ( I − P ) y = My where M = I − P \mathbf{M} = \mathbf{I} - \mathbf{P} M = I − P is the residual maker .
Properties of P and M
P \mathbf{P} P and M \mathbf{M} M are symmetric and idempotent (P 2 = P \mathbf{P}^2 = \mathbf{P} P 2 = P )
P M = 0 \mathbf{PM} = \mathbf{0} PM = 0
tr ( P ) = k \text{tr}(\mathbf{P}) = k tr ( P ) = k (number of regressors)
tr ( M ) = n − k \text{tr}(\mathbf{M}) = n - k tr ( M ) = n − k (degrees of freedom)
Variance of OLS Estimator
Under homoskedasticity (Var ( ε ∣ X ) = σ 2 I \text{Var}(\boldsymbol{\varepsilon}|\mathbf{X}) = \sigma^2\mathbf{I} Var ( ε ∣ X ) = σ 2 I ):
Var ( β ^ ∣ X ) = σ 2 ( X ′ X ) − 1 \text{Var}(\hat{\boldsymbol{\beta}}|\mathbf{X}) = \sigma^2(\mathbf{X}'\mathbf{X})^{-1} Var ( β ^ ∣ X ) = σ 2 ( X ′ X ) − 1 Under heteroskedasticity (Var ( ε ∣ X ) = Ω \text{Var}(\boldsymbol{\varepsilon}|\mathbf{X}) = \boldsymbol{\Omega} Var ( ε ∣ X ) = Ω ):
Var ( β ^ ∣ X ) = ( X ′ X ) − 1 X ′ Ω X ( X ′ X ) − 1 \text{Var}(\hat{\boldsymbol{\beta}}|\mathbf{X}) = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\Omega}\mathbf{X}(\mathbf{X}'\mathbf{X})^{-1} Var ( β ^ ∣ X ) = ( X ′ X ) − 1 X ′ Ω X ( X ′ X ) − 1 This is the "sandwich" form used in robust standard errors.
C.14 The Frisch-Waugh-Lovell Theorem
To obtain the coefficient on X 1 \mathbf{X}_1 X 1 in:
y = X 1 β 1 + X 2 β 2 + ε \mathbf{y} = \mathbf{X}_1\boldsymbol{\beta}_1 + \mathbf{X}_2\boldsymbol{\beta}_2 + \boldsymbol{\varepsilon} y = X 1 β 1 + X 2 β 2 + ε equivalently:
Regress y \mathbf{y} y on X 2 \mathbf{X}_2 X 2 , save residuals y ~ \tilde{\mathbf{y}} y ~
Regress X 1 \mathbf{X}_1 X 1 on X 2 \mathbf{X}_2 X 2 , save residuals X ~ 1 \tilde{\mathbf{X}}_1 X ~ 1
Regress y ~ \tilde{\mathbf{y}} y ~ on X ~ 1 \tilde{\mathbf{X}}_1 X ~ 1
This gives the same β ^ 1 \hat{\boldsymbol{\beta}}_1 β ^ 1 as the full regression.
Intuition : Coefficient on X 1 \mathbf{X}_1 X 1 uses only variation in X 1 \mathbf{X}_1 X 1 not explained by X 2 \mathbf{X}_2 X 2 .
Further Reading
Greene, W. H. (2018). Econometric Analysis (8th ed.). Pearson. Appendix A.
Magnus, J. R., & Neudecker, H. (2019). Matrix Differential Calculus (3rd ed.). Wiley.
Abadir, K. M., & Magnus, J. R. (2005). Matrix Algebra . Cambridge University Press.