Chapter 7: Dynamics and Time Series Foundations

Opening Question

When data are ordered by time, what new patterns emerge—and what new problems arise?


Chapter Overview

Time series data differ fundamentally from cross-sectional data. Observations are ordered; yesterday influences today influences tomorrow. This temporal dependence creates both opportunities (we can model dynamics, forecast the future) and challenges (standard regression assumptions fail, spurious correlations abound).

This chapter develops the foundations for working with time-indexed data. We cover decomposition (separating trend, season, and cycle), stationarity (the key concept for valid inference), modeling (ARIMA, state space), and forecasting (principles and evaluation). These foundations support the causal time series methods in Chapter 16.

What you will learn:

  • How time series differ from cross-sectional data

  • Decomposing time series into trend, seasonal, and cyclical components

  • What stationarity means and why it matters

  • ARIMA modeling and its extensions

  • Principles of forecasting and forecast evaluation

Prerequisites: Chapter 3 (Statistical Foundations), some exposure to regression


Historical Context: The Science of Prediction

Time series analysis has ancient roots in astronomy and navigation, but its modern statistical form emerged in the 20th century.

Yule (1927) showed that simple correlation between trending series could be "nonsense" (spurious)—a fundamental insight that took decades to fully appreciate.

Wold (1938) proved that any stationary process can be represented as a moving average of past shocks—the Wold decomposition theorem that underlies much of time series analysis.

Box and Jenkins (1970) systematized the ARIMA modeling approach, transforming time series from art to science with their identification-estimation-diagnosis cycle.

Granger and Newbold (1974) demonstrated the spurious regression problem empirically, showing that independent random walks appeared correlated.

Dickey and Fuller (1979) developed tests for unit roots, enabling researchers to distinguish stationary from non-stationary series.

Engle and Granger (1987) introduced cointegration—the idea that non-stationary series can share common trends—earning Granger the Nobel Prize.

In economics, time series methods are central to macroeconomics (business cycles, inflation, monetary policy) and finance (asset prices, volatility, risk).


7.1 Working with Time-Indexed Data

Data Structures

Time series: One unit observed over many periods

  • Example: U.S. quarterly GDP, 1947-2023

Panel (longitudinal): Many units observed over multiple periods

  • Example: GDP for 50 countries, 1960-2020

Cross-section: Many units observed once

  • Example: GDP for 150 countries in 2020

Repeated cross-section: Different units sampled at different times

  • Example: CPS monthly surveys (different people each month)

Frequency and Aggregation

Frequency: How often is the variable measured?

  • High frequency: Tick data, minute, hour, daily

  • Medium frequency: Weekly, monthly, quarterly

  • Low frequency: Annual

Aggregation matters: Relationships may differ at different frequencies. Monthly inflation dynamics differ from annual inflation dynamics.

Temporal aggregation creates:

  • Smoothing (averaging removes noise)

  • Timing issues (when did the event occur within the period?)

  • Potential bias (if aggregation is non-linear in the underlying process)

Time Series Plots

The time series plot is fundamental: variable on y-axis, time on x-axis.

Figure 7.1: A time series plot of real GDP shows the key features of economic time series: an upward trend (dashed line), business cycle fluctuations around trend, and occasional sharp contractions (shaded recession periods).

What to look for:

  • Trend (persistent increase or decrease)

  • Seasonality (regular periodic patterns)

  • Cycles (irregular fluctuations)

  • Structural breaks (sudden changes)

  • Outliers (unusual values)


7.2 Decomposition

The Components

A time series YtY_t can be decomposed:

Additive model: Yt=Tt+St+Ct+εtY_t = T_t + S_t + C_t + \varepsilon_t

Multiplicative model: Yt=Tt×St×Ct×εtY_t = T_t \times S_t \times C_t \times \varepsilon_t

where:

  • TtT_t = Trend (long-run persistent movement)

  • StS_t = Seasonal (regular calendar-based patterns)

  • CtC_t = Cycle (irregular but persistent fluctuations)

  • εt\varepsilon_t = Irregular (noise)

Trend Extraction

Moving average filters: Average observations over a window T^t=12k+1j=kkYt+j\hat{T}_t = \frac{1}{2k+1}\sum_{j=-k}^{k} Y_{t+j}

Centered moving average removes short-run fluctuations.

Hodrick-Prescott (HP) filter: Minimize: t=1T(YtTt)2+λt=2T1[(Tt+1Tt)(TtTt1)]2\sum_{t=1}^{T}(Y_t - T_t)^2 + \lambda \sum_{t=2}^{T-1}[(T_{t+1} - T_t) - (T_t - T_{t-1})]^2

  • First term: Cycle should be small

  • Second term: Trend should be smooth

  • λ\lambda controls tradeoff (typically 1600 for quarterly data)

Baxter-King filter: Band-pass filter isolating specific frequencies.

Seasonal Adjustment

Why adjust?: Many series have strong seasonal patterns (retail sales in December, unemployment in summer). Seasonally adjusted data remove predictable calendar effects.

Methods:

  • Classical decomposition: Moving average for trend, then seasonal indices

  • X-13 ARIMA-SEATS: Census Bureau method, sophisticated model-based adjustment

  • STL decomposition: Locally weighted regression (loess) for trend and season

Worked Example: Retail Sales Decomposition

Data: Monthly U.S. retail sales, 2015-2023

Decomposition reveals:

  • Trend: Steady increase, accelerating post-2020

  • Seasonal: December peak (~15% above trend), January trough

  • Cycle: COVID shock (March-April 2020), recovery boom

  • Irregular: Month-to-month noise

Figure 7.2: Time series decomposition separates observed data into components. The trend shows long-run growth with a COVID-era jump; seasonal effects reveal the December peak; the remainder captures irregular shocks including the 2020 crash.

Interpretation: The December spike is seasonal (Christmas shopping); the 2020 crash and boom are cyclical (COVID); the upward drift is trend (economic growth and inflation).


7.3 Stationarity

Why Stationarity Matters

Definition 7.1 (Weak Stationarity): A time series {Yt}\{Y_t\} is weakly stationary if:

  1. E[Yt]=μE[Y_t] = \mu (constant mean)

  2. Var(Yt)=σ2<Var(Y_t) = \sigma^2 < \infty (constant, finite variance)

  3. Cov(Yt,Ytk)=γkCov(Y_t, Y_{t-k}) = \gamma_k (covariance depends only on lag, not time)

Why it matters:

  • Standard regression assumes fixed distributions; stationarity provides this for time series

  • Non-stationary series violate regression assumptions

  • Two independent random walks appear correlated (spurious regression)

  • Forecasting requires the future to resemble the past (stationarity makes this sensible)

Stationary vs. Non-Stationary

Stationary examples:

  • Interest rate spreads (tend to revert to mean)

  • Inflation (in stable policy regimes)

  • Detrended GDP

Non-stationary examples:

  • GDP levels (trending upward)

  • Stock prices (random walk)

  • Population (growing)

The Random Walk

The random walk is the canonical non-stationary process: Yt=Yt1+εtY_t = Y_{t-1} + \varepsilon_t

where εt\varepsilon_t is white noise.

Properties:

  • E[Yt]=Y0E[Y_t] = Y_0 (constant mean—but what is the "mean" of a random walk?)

  • Var(Yt)=tσε2Var(Y_t) = t\sigma^2_\varepsilon (variance grows without bound)

  • No mean reversion; shocks permanent

Random walk with drift: Yt=μ+Yt1+εtY_t = \mu + Y_{t-1} + \varepsilon_t

Has a stochastic trend plus a deterministic drift.

Unit Root Tests

Testing whether a series is stationary:

Augmented Dickey-Fuller (ADF) test: ΔYt=α+γYt1+j=1pδjΔYtj+εt\Delta Y_t = \alpha + \gamma Y_{t-1} + \sum_{j=1}^{p} \delta_j \Delta Y_{t-j} + \varepsilon_t

H0:γ=0H_0: \gamma = 0 (unit root, non-stationary) H1:γ<0H_1: \gamma < 0 (stationary)

Critical values are non-standard (not normal distribution); use Dickey-Fuller tables.

Phillips-Perron test: Non-parametric correction for serial correlation.

KPSS test: Null is stationarity (reversed from ADF).

Worked Example: Testing for Unit Root

Series: U.S. quarterly real GDP, 1960-2020

ADF test on log GDP levels:

  • Test statistic: -1.24

  • 5% critical value: -2.88

  • Fail to reject unit root

ADF test on log GDP growth (first difference):

  • Test statistic: -6.35

  • 5% critical value: -2.88

  • Strongly reject unit root

Conclusion: Log GDP has a unit root (is I(1)); GDP growth is stationary (I(0)).

Spurious Regression

Granger and Newbold (1974): Two independent random walks show "significant" correlation.

Simulation: Generate two independent random walks, each of length 100:

  • Regress YtY_t on XtX_t

  • t-statistic often exceeds 2 (appears "significant")

  • R2R^2 can be 0.3-0.5

The problem: Standard errors assume stationarity; with unit roots, they're wrong.

Solution:

  • Difference the series if they're I(1)

  • Use cointegration if series share a common trend


7.4 Cointegration

The Concept

If YtY_t and XtX_t are both I(1), but some linear combination YtβXtY_t - \beta X_t is I(0), they are cointegrated.

Definition 7.2 (Cointegration): Variables are cointegrated if they are individually non-stationary (I(1)) but a linear combination is stationary (I(0)).

Interpretation: Variables share a common stochastic trend; they move together in the long run.

Economic Examples

Consumption and income: Both are I(1), but the consumption-income ratio tends to be stable (cointegrated).

Spot and futures prices: Both are I(1), but the basis (difference) is stationary (arbitrage keeps them together).

Short and long interest rates: Both are I(1), but the yield spread tends to be stationary.

Error Correction Model

If YtY_t and XtX_t are cointegrated with cointegrating vector (1,β)(1, -\beta), the error correction model is:

ΔYt=α(Yt1βXt1)+γΔXt+εt\Delta Y_t = \alpha (Y_{t-1} - \beta X_{t-1}) + \gamma \Delta X_t + \varepsilon_t

where:

  • (Yt1βXt1)(Y_{t-1} - \beta X_{t-1}) is the error correction term (deviation from long-run equilibrium)

  • α<0\alpha < 0 implies adjustment back toward equilibrium

  • γ\gamma captures short-run dynamics

Testing for Cointegration

Engle-Granger two-step:

  1. Regress YtY_t on XtX_t by OLS; save residuals u^t\hat{u}_t

  2. Test u^t\hat{u}_t for unit root (ADF test with modified critical values)

Johansen test: System-based test that can identify multiple cointegrating relationships.


7.5 ARIMA Models

The Building Blocks

AR(p) - Autoregressive: Current value depends on past values Yt=c+ϕ1Yt1+ϕ2Yt2+...+ϕpYtp+εtY_t = c + \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + ... + \phi_p Y_{t-p} + \varepsilon_t

MA(q) - Moving Average: Current value depends on past shocks Yt=μ+εt+θ1εt1+...+θqεtqY_t = \mu + \varepsilon_t + \theta_1 \varepsilon_{t-1} + ... + \theta_q \varepsilon_{t-q}

ARMA(p,q) - Combines both: Yt=c+i=1pϕiYti+εt+j=1qθjεtjY_t = c + \sum_{i=1}^{p} \phi_i Y_{t-i} + \varepsilon_t + \sum_{j=1}^{q} \theta_j \varepsilon_{t-j}

ARIMA(p,d,q) - Integrates (differences) first:

  • Apply model to ΔdYt\Delta^d Y_t

  • d = 1 for most I(1) series

The Box-Jenkins Methodology

Step 1: Identification

  • Plot the series; assess stationarity

  • If non-stationary, difference

  • Examine ACF and PACF to choose p and q

Model
ACF
PACF

AR(p)

Decays

Cuts off at p

MA(q)

Cuts off at q

Decays

ARMA

Decays

Decays

Step 2: Estimation

  • Estimate by maximum likelihood

  • Standard software handles this

Step 3: Diagnosis

  • Check residuals for white noise (Ljung-Box test)

  • Check for remaining autocorrelation

  • If problems, return to identification

Worked Example: Inflation Forecasting

Data: U.S. monthly CPI inflation, 1990-2020

Identification:

  • Series appears stationary (no differencing needed)

  • ACF decays slowly

  • PACF cuts off after lag 2

  • Suggests AR(2)

Estimation (AR(2)): πt=0.002+0.45πt1+0.25πt2+εt\pi_t = 0.002 + 0.45 \pi_{t-1} + 0.25 \pi_{t-2} + \varepsilon_t

Diagnosis:

  • Ljung-Box test: No remaining autocorrelation (p = 0.35)

  • Residuals approximately white noise

Interpretation:

  • Inflation is persistent (coefficients positive)

  • Current inflation depends on last two months

  • 70% of a shock persists after one month; about 50% after two months

Model Selection

Information criteria: AIC=2+2kAIC = -2\ell + 2k BIC=2+klog(T)BIC = -2\ell + k\log(T)

where \ell is log-likelihood and kk is number of parameters.

  • Lower is better

  • BIC penalizes complexity more; tends to choose simpler models

Modeling Time-Varying Volatility: ARCH and GARCH

ARIMA models assume constant variance (homoskedasticity). But financial returns exhibit volatility clustering: large changes tend to follow large changes, and small changes follow small changes. The variance itself evolves over time.

Box: ARCH and GARCH Models

ARCH (Autoregressive Conditional Heteroskedasticity), introduced by Engle (1982), models variance as depending on past squared residuals:

yt=μ+εt,εt=σtzt,ztN(0,1)y_t = \mu + \varepsilon_t, \quad \varepsilon_t = \sigma_t z_t, \quad z_t \sim N(0,1) σt2=ω+α1εt12++αqεtq2\sigma_t^2 = \omega + \alpha_1 \varepsilon_{t-1}^2 + \cdots + \alpha_q \varepsilon_{t-q}^2

Large shocks (εt12\varepsilon_{t-1}^2 high) increase conditional variance σt2\sigma_t^2, which then decays over time.

GARCH (Generalized ARCH), introduced by Bollerslev (1986), adds lagged variance terms for more parsimonious modeling:

σt2=ω+αεt12+βσt12\sigma_t^2 = \omega + \alpha \varepsilon_{t-1}^2 + \beta \sigma_{t-1}^2

GARCH(1,1) captures volatility persistence: today's variance depends on yesterday's shock (α\alpha) and yesterday's variance (β\beta). The persistence α+β\alpha + \beta determines how quickly volatility decays.

Why it matters:

  • Finance: Risk management, option pricing, Value-at-Risk

  • Macroeconomics: Uncertainty shocks, time-varying risk premia

  • General econometrics: Correct standard errors when variance is non-constant

Extensions: EGARCH (asymmetric effects—bad news increases volatility more than good news), GJR-GARCH, multivariate GARCH for portfolio modeling.

Implementation: R packages rugarch, rmgarch; Python arch package.


7.6 Vector Autoregressions (VARs)

From Univariate to Multivariate

ARIMA models treat each series in isolation. But economic variables interact: output affects unemployment, inflation affects interest rates, exchange rates affect trade. Vector Autoregressions (VARs) capture these dynamic interdependencies.

A VAR treats all variables as endogenous and models them as depending on their own lags and lags of other variables in the system.

The VAR(p) Model

For a vector of nn variables Yt=(y1t,y2t,,ynt)Y_t = (y_{1t}, y_{2t}, \ldots, y_{nt})':

Yt=c+A1Yt1+A2Yt2++ApYtp+εtY_t = c + A_1 Y_{t-1} + A_2 Y_{t-2} + \cdots + A_p Y_{t-p} + \varepsilon_t

where:

  • cc = n×1n \times 1 vector of constants

  • AjA_j = n×nn \times n matrices of coefficients

  • εt\varepsilon_t = n×1n \times 1 vector of error terms with E[εt]=0E[\varepsilon_t] = 0, E[εtεt]=ΣE[\varepsilon_t \varepsilon_t'] = \Sigma

Example: A two-variable VAR(1) for output growth (Δy\Delta y) and inflation (π\pi):

(Δytπt)=(c1c2)+(a11a12a21a22)(Δyt1πt1)+(ε1tε2t)\begin{pmatrix} \Delta y_t \\ \pi_t \end{pmatrix} = \begin{pmatrix} c_1 \\ c_2 \end{pmatrix} + \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix} \begin{pmatrix} \Delta y_{t-1} \\ \pi_{t-1} \end{pmatrix} + \begin{pmatrix} \varepsilon_{1t} \\ \varepsilon_{2t} \end{pmatrix}

The coefficient a12a_{12} captures how last period's inflation affects current output growth; a21a_{21} captures how last period's output growth affects current inflation.

Estimation

Each equation in a VAR can be estimated by OLS—this is efficient because all equations have the same right-hand-side variables.

Lag selection: Use information criteria (AIC, BIC) or test-based procedures. With monthly data, 12 lags is common; with quarterly data, 4-8 lags.

Granger Causality

Does variable XX help predict variable YY beyond YY's own history?

Definition: Granger Causality

XX Granger-causes YY if past values of XX contain information useful for predicting YY beyond what is contained in past values of YY alone.

This is tested by checking whether the coefficients on lagged XX in the YY equation are jointly zero.

Warning: Granger causality is about predictive content, not causality in the treatment effect sense. A better name would be "Granger predictability." See Chapter 16 for genuine causal inference with time series.

Impulse Response Functions (IRFs)

The key tool for interpreting VARs. An impulse response function traces out how a shock to one variable propagates through the system over time.

The identification problem: VAR residuals are contemporaneously correlated. If both ε1t\varepsilon_{1t} and ε2t\varepsilon_{2t} occur together, which caused which?

Cholesky decomposition: The simplest solution orders variables and assumes shocks flow recursively. If output is ordered first:

  • A shock to output affects both output and inflation contemporaneously

  • A shock to inflation affects only inflation contemporaneously (not output)

The ordering matters! This is "recursive" or "triangular" identification.

Interpretation: IRFs show the dynamic path of variables following a one-standard-deviation shock. Error bands (typically bootstrapped) show uncertainty.

Forecast Error Variance Decomposition

Variance decomposition answers: What fraction of the forecast error variance in variable YY is attributable to shocks in variable XX?

This reveals which shocks are the main drivers of each variable's fluctuations.

Example finding: "At a 2-year horizon, 60% of output forecast error variance is due to output shocks, 25% to monetary shocks, 15% to supply shocks."

Structural VARs

Reduced-form VARs are useful for forecasting but agnostic about causality. Structural VARs (SVARs) impose economic restrictions to identify causal shocks.

Identification strategies include:

  • Short-run restrictions (Cholesky, contemporaneous zeros)

  • Long-run restrictions (e.g., money has no long-run effect on output)

  • Sign restrictions (e.g., demand shocks raise both output and prices)

  • External instruments (narrative shocks—see Chapter 16)

The frontier of SVAR identification is covered in Chapter 16 (Time Series Causal Inference), where we discuss how to make causal—not just predictive—claims from time series data.

Practical Guidance for VARs

Issue
Guidance

How many variables?

3-7 is typical; more variables means more parameters

Stationarity

Difference non-stationary variables, or use levels if cointegrated (VECM)

Lag length

Use information criteria; check residual autocorrelation

Ordering for Cholesky

Think about economic timing: slower-moving variables first

Interpretation

Focus on IRFs and variance decomposition, not individual coefficients


7.7 State Space Models and the Kalman Filter

The State Space Framework

Many time series models can be written in state space form:

Measurement equation: Yt=Ztαt+εtY_t = Z_t \alpha_t + \varepsilon_t Transition equation: αt+1=Ttαt+Rtηt\alpha_{t+1} = T_t \alpha_t + R_t \eta_t

where:

  • YtY_t = observed series

  • αt\alpha_t = unobserved state

  • εt\varepsilon_t, ηt\eta_t = error terms

Examples:

  • ARIMA models

  • Dynamic factor models

  • Unobserved components (trend + cycle)

  • Time-varying parameter models

The Kalman Filter

The Kalman filter recursively estimates the unobserved state:

Prediction: αtt1=Tt1αt1t1\alpha_{t|t-1} = T_{t-1}\alpha_{t-1|t-1}

Updating: αtt=αtt1+Kt(YtZtαtt1)\alpha_{t|t} = \alpha_{t|t-1} + K_t(Y_t - Z_t\alpha_{t|t-1})

where KtK_t is the Kalman gain, weighting new information.

What it provides:

  • Filtered estimates (best estimate at time t using data through t)

  • Smoothed estimates (best estimate at time t using all data)

  • Forecast distributions

Applications

Missing data: Kalman filter handles missing observations naturally.

Mixed frequency: Combine monthly and quarterly data.

Unobserved components: Estimate separate trend and cycle.

Real-time estimation: Update estimates as new data arrive.


7.8 Forecasting

The Forecasting Problem

Goal: Predict YT+hY_{T+h} given information through time TT.

Point forecast: Y^T+hT=E[YT+hYT,YT1,...]\hat{Y}_{T+h|T} = E[Y_{T+h} | Y_T, Y_{T-1}, ...]

Interval forecast: Range containing YT+hY_{T+h} with specified probability.

Forecast Evaluation

Loss functions:

  • Mean Squared Error: MSE=1Hh=1H(YT+hY^T+hT)2MSE = \frac{1}{H}\sum_{h=1}^{H}(Y_{T+h} - \hat{Y}_{T+h|T})^2

  • Mean Absolute Error: MAE=1Hh=1HYT+hY^T+hTMAE = \frac{1}{H}\sum_{h=1}^{H}|Y_{T+h} - \hat{Y}_{T+h|T}|

  • Mean Absolute Percentage Error: MAPE=100Hh=1HYT+hY^T+hTYT+hMAPE = \frac{100}{H}\sum_{h=1}^{H}\frac{|Y_{T+h} - \hat{Y}_{T+h|T}|}{|Y_{T+h}|}

Out-of-sample evaluation:

  • Split sample: Estimate on early data, evaluate on later data

  • Rolling window: Re-estimate as new data arrive

  • Expanding window: Add data without dropping old

Forecast Comparison

Diebold-Mariano test: Test whether two forecasts have equal accuracy.

H0H_0: E[L(e1t)]=E[L(e2t)]E[L(e_{1t})] = E[L(e_{2t})] (equal loss)

where LL is the loss function and eite_{it} are forecast errors.

Combining Forecasts

Bates and Granger (1969): Combined forecasts often outperform individual forecasts.

Simple average: Often hard to beat

Weighted average: Weights based on past performance

Why combining works: Different models capture different aspects of the data; combination diversifies across model uncertainty.

Forecast Uncertainty

Fan charts: Show distribution of forecasts at each horizon

Density forecasts: Full predictive distribution, not just point forecast

Honest uncertainty: Forecast intervals should cover the true value at the stated rate (calibration).

Figure 7.3: A fan chart displays forecast uncertainty honestly. The point forecast (dashed line) shows mean reversion toward the 2% target. Darker shading indicates higher probability regions. Uncertainty expands with the forecast horizon—a fundamental feature of time series prediction.

Practical Guidance

Workflow for Time Series Analysis

  1. Plot the data; identify obvious features

  2. Test for stationarity (ADF, KPSS)

  3. Transform if needed (difference, log)

  4. Model (ARIMA, state space)

  5. Diagnose (residual tests)

  6. Forecast (point and interval)

  7. Evaluate (out-of-sample)

Common Pitfalls

Pitfall 1: Regressing non-stationary series Regression of I(1) on I(1) series produces spurious results.

How to avoid: Test for stationarity; difference or use cointegration.

Pitfall 2: Over-differencing If a series is already stationary, differencing induces a unit root.

How to avoid: Test before differencing; check both ADF and KPSS.

Pitfall 3: Overfitting Too many lags or parameters improve in-sample fit but hurt forecasts.

How to avoid: Use information criteria; evaluate out-of-sample.

Pitfall 4: Ignoring structural breaks Models estimated on pre-break data don't forecast post-break.

How to avoid: Test for breaks; use rolling windows; acknowledge uncertainty.

Box: Structural Breaks and Their Consequences

A structural break occurs when the data-generating process changes at some point in time. This can affect means, variances, persistence, or relationships between variables.

Why structural breaks matter:

  1. Unit root tests are biased: The ADF test cannot distinguish between a true unit root and a stationary series with a structural break. A break in mean looks like non-stationarity, biasing tests toward failing to reject the unit root null.

  2. Forecasts fail at breaks: A model estimated on pre-break data will systematically mis-forecast after a break. The "Great Moderation" (1984-2007 decline in volatility) led to models that vastly underestimated post-2008 crisis volatility.

  3. Inference is invalid: Standard errors assume parameter stability. Pooling across regimes produces meaningless "average" estimates.

Common sources of breaks:

  • Policy regime changes (Volcker disinflation 1979, inflation targeting adoption)

  • Financial crises (2008, COVID-19)

  • Institutional changes (currency pegs, trade liberalization)

  • Technological shifts (dot-com era, AI adoption)

Detection methods:

  • Chow test: Tests whether coefficients differ before/after a known break date

  • CUSUM/CUSUMSQ: Recursive residual-based tests for parameter instability

  • Bai-Perron (1998, 2003): Estimates multiple unknown break dates

  • Quandt-Andrews: Sup-Wald test over all possible break points

Practical responses:

  • Test for breaks before assuming stationarity

  • If breaks exist: (a) estimate separate models for each regime, (b) use dummy variables for level/trend shifts, or (c) allow time-varying parameters

  • Use rolling or expanding window estimation to detect parameter drift

  • Be humble about forecasts spanning potential break points

Example: U.S. inflation persistence appears to have declined after the early 1980s. An AR model estimated on 1960-1980 data would dramatically overpredict inflation persistence in later decades.

Rules of Thumb

  • Test for unit roots before regression

  • Difference I(1) series (or use cointegration)

  • Evaluate forecasts out-of-sample, not in-sample

  • Start simple: Random walk, AR(1), AR(2) often work well

  • Report uncertainty: Intervals, fan charts, not just point forecasts


Running Example: Business Cycles and Monetary Policy

U.S. Business Cycles

Data: Quarterly real GDP, 1947-2023

Characteristics:

  • Positive trend (economic growth)

  • Irregular cycles around trend (expansions and recessions)

  • Occasional sharp drops (2008, 2020)

HP filter decomposition:

  • Trend grows ~2-3% per year

  • Cycle fluctuates ±2-3% around trend

  • Recessions visible as negative cycle

AR(2) for output gap (HP-filtered cycle): gapt=0.72gapt1+0.14gapt2+εtgap_t = 0.72 \cdot gap_{t-1} + 0.14 \cdot gap_{t-2} + \varepsilon_t

  • Highly persistent (sum of AR coefficients near 0.86)

  • Cycles last ~16 quarters on average

Forecasting Inflation

Question: Can we forecast inflation? How accurate are forecasts?

Models compared:

  1. Random walk: π^t+h=πt\hat{\pi}_{t+h} = \pi_t

  2. AR(2)

  3. Phillips curve: AR with unemployment gap

  4. Survey expectations

Results (out-of-sample RMSE, 1990-2020):

  • Random walk: 0.45

  • AR(2): 0.42

  • Phillips curve: 0.44

  • Survey: 0.38

Interpretation: Inflation is hard to forecast; simple models do nearly as well as complex ones; survey expectations contain information models miss.

This descriptive analysis sets up Chapter 16's examination of causal questions about monetary policy.


Integration Note

Connections to Other Methods

Method
Relationship
See Chapter

Regression

Time series regression has special issues

Ch. 3

Causal Inference

TS causal methods build on these foundations

Ch. 16

Panel Data

Panel methods combine TS and cross-section

Ch. 13, 15

Forecasting

Distinct from causal inference

This chapter

Triangulation Strategies

Time series descriptions gain credibility when:

  1. Multiple methods agree: ARIMA and state space give similar patterns

  2. Robustness to specification: Results stable to lag length, filter parameter

  3. Economic interpretation: Patterns align with economic events and theory

  4. Out-of-sample validation: Models forecast well on new data

  5. Comparison to survey data: Model-based and survey expectations align


Summary

Key takeaways:

  1. Time series data have special structure: Temporal dependence requires methods that account for the ordering of observations.

  2. Decomposition separates trend, season, and cycle: Understanding components is essential before modeling.

  3. Stationarity is fundamental: Non-stationary series require transformation (differencing, detrending) before analysis.

  4. Spurious regression is real: Two independent random walks appear correlated. Always test for unit roots.

  5. ARIMA models provide a flexible framework for modeling stationary series after differencing.

  6. Forecasting requires out-of-sample evaluation: In-sample fit is misleading; simple models often beat complex ones.

Returning to the opening question: Time ordering creates patterns (trends, cycles, persistence) that cross-sectional data lack—and problems (non-stationarity, spurious correlation) that require specialized tools. Understanding time series foundations is essential for anyone working with temporally ordered data, whether for description, forecasting, or causal inference.


Further Reading

Essential

  • Hamilton (1994), Time Series Analysis - The definitive graduate text

  • Hyndman and Athanasopoulos (2021), Forecasting: Principles and Practice - Excellent applied treatment (free online)

For Deeper Understanding

  • Box, Jenkins, and Reinsel (2015), Time Series Analysis - Classic ARIMA text

  • Harvey (1989), Forecasting, Structural Time Series, and the Kalman Filter - State space methods

  • Enders (2014), Applied Econometric Time Series - Econometrics focus

Historical/Methodological

  • Granger and Newbold (1974), "Spurious Regressions in Econometrics" - The classic warning

  • Dickey and Fuller (1979), "Distribution of the Estimators..." - Unit root testing

  • Engle and Granger (1987), "Co-Integration and Error Correction" - Nobel Prize work

Applications

  • Stock and Watson (2020), Introduction to Econometrics, Ch. 15-16 - Applied TS econometrics

  • Clark and McCracken (2013), "Advances in Forecast Evaluation" - Forecasting methods

  • Diebold (2007), Elements of Forecasting - Practical guide


Exercises

Conceptual

  1. Explain the difference between a trend-stationary and a difference-stationary process. How would you distinguish them empirically?

  2. Why does regressing one random walk on another produce spurious results? What happens to the standard errors?

  3. What is cointegration? Give an economic example and explain why the series might be cointegrated.

Applied

  1. Using monthly data on an economic variable of your choice:

    • Plot the series and describe what you see

    • Test for stationarity (ADF and KPSS)

    • Estimate an appropriate ARIMA model

    • Produce forecasts and evaluate out-of-sample

  2. Using quarterly GDP data:

    • Apply the HP filter to extract the cycle

    • Estimate an AR model for the output gap

    • Calculate the average duration of business cycles implied by your model

Discussion

  1. A financial analyst says: "My trading model has an R² of 0.85 on historical data, so I'm confident in my forecasts." What questions would you ask?


Technical Appendix

A. Wold Decomposition

Any covariance stationary process can be written: Yt=μ+j=0ψjεtjY_t = \mu + \sum_{j=0}^{\infty} \psi_j \varepsilon_{t-j}

where ψj2<\sum \psi_j^2 < \infty and εt\varepsilon_t is white noise.

This justifies MA representations and is the foundation for impulse response analysis.

B. Stationarity Conditions for AR(p)

The AR(p) process is stationary if and only if all roots of the characteristic polynomial: 1ϕ1zϕ2z2...ϕpzp=01 - \phi_1 z - \phi_2 z^2 - ... - \phi_p z^p = 0

lie outside the unit circle.

For AR(1): ϕ1<1|\phi_1| < 1 For AR(2): ϕ1+ϕ2<1\phi_1 + \phi_2 < 1, ϕ2ϕ1<1\phi_2 - \phi_1 < 1, ϕ2<1|\phi_2| < 1

C. Kalman Filter Equations

Prediction:

  • αtt1=Tαt1t1\alpha_{t|t-1} = T \alpha_{t-1|t-1}

  • Ptt1=TPt1t1T+RQRP_{t|t-1} = T P_{t-1|t-1} T' + R Q R'

Update:

  • Kt=Ptt1Z(ZPtt1Z+H)1K_t = P_{t|t-1} Z'(Z P_{t|t-1} Z' + H)^{-1}

  • αtt=αtt1+Kt(YtZαtt1)\alpha_{t|t} = \alpha_{t|t-1} + K_t(Y_t - Z \alpha_{t|t-1})

  • Ptt=(IKtZ)Ptt1P_{t|t} = (I - K_t Z) P_{t|t-1}

where Q=Var(ηt)Q = Var(\eta_t) and H=Var(εt)H = Var(\varepsilon_t).

Last updated