Chapter 7: Dynamics and Time Series Foundations

Opening Question

When data are ordered by time, what new patterns emerge—and what new problems arise?

Chapter Overview

Time series data differ fundamentally from cross-sectional data. Observations are ordered; yesterday influences today influences tomorrow. This temporal dependence creates both opportunities (we can model dynamics, forecast the future) and challenges (standard regression assumptions fail, spurious correlations abound).

This chapter develops the foundations for working with time-indexed data. We cover decomposition (separating trend, season, and cycle), stationarity (the key concept for valid inference), modeling (ARIMA, state space), and forecasting (principles and evaluation). These foundations support the causal time series methods in Chapter 16.

What you will learn:

How time series differ from cross-sectional data
Decomposing time series into trend, seasonal, and cyclical components
What stationarity means and why it matters
ARIMA modeling and its extensions
Principles of forecasting and forecast evaluation

Prerequisites: Chapter 3 (Statistical Foundations), some exposure to regression

Historical Context: The Science of Prediction

Time series analysis has ancient roots in astronomy and navigation, but its modern statistical form emerged in the 20th century.

Yule (1927) showed that simple correlation between trending series could be "nonsense" (spurious)—a fundamental insight that took decades to fully appreciate.

Wold (1938) proved that any stationary process can be represented as a moving average of past shocks—the Wold decomposition theorem that underlies much of time series analysis.

Box and Jenkins (1970) systematized the ARIMA modeling approach, transforming time series from art to science with their identification-estimation-diagnosis cycle.

Granger and Newbold (1974) demonstrated the spurious regression problem empirically, showing that independent random walks appeared correlated.

Dickey and Fuller (1979) developed tests for unit roots, enabling researchers to distinguish stationary from non-stationary series.

Engle and Granger (1987) introduced cointegration—the idea that non-stationary series can share common trends—earning Granger the Nobel Prize.

In economics, time series methods are central to macroeconomics (business cycles, inflation, monetary policy) and finance (asset prices, volatility, risk).

7.1 Working with Time-Indexed Data

Data Structures

Time series: One unit observed over many periods

Example: U.S. quarterly GDP, 1947-2023

Panel (longitudinal): Many units observed over multiple periods

Example: GDP for 50 countries, 1960-2020

Cross-section: Many units observed once

Example: GDP for 150 countries in 2020

Repeated cross-section: Different units sampled at different times

Example: CPS monthly surveys (different people each month)

Frequency and Aggregation

Frequency: How often is the variable measured?

High frequency: Tick data, minute, hour, daily
Medium frequency: Weekly, monthly, quarterly
Low frequency: Annual

Aggregation matters: Relationships may differ at different frequencies. Monthly inflation dynamics differ from annual inflation dynamics.

Temporal aggregation creates:

Smoothing (averaging removes noise)
Timing issues (when did the event occur within the period?)
Potential bias (if aggregation is non-linear in the underlying process)

Time Series Plots

The time series plot is fundamental: variable on y-axis, time on x-axis.

What to look for:

Trend (persistent increase or decrease)
Seasonality (regular periodic patterns)
Cycles (irregular fluctuations)
Structural breaks (sudden changes)
Outliers (unusual values)

7.2 Decomposition

The Components

A time series $Y_t$ can be decomposed:

Additive model: $Y_t = T_t + S_t + C_t + \varepsilon_t$

Multiplicative model: $Y_t = T_t \times S_t \times C_t \times \varepsilon_t$

where:

$T_t$ = Trend (long-run persistent movement)
$S_t$ = Seasonal (regular calendar-based patterns)
$C_t$ = Cycle (irregular but persistent fluctuations)
$\varepsilon_t$ = Irregular (noise)

Trend Extraction

Moving average filters: Average observations over a window $\hat{T}_t = \frac{1}{2k+1}\sum_{j=-k}^{k} Y_{t+j}$

Centered moving average removes short-run fluctuations.

Hodrick-Prescott (HP) filter: Minimize: $\sum_{t=1}^{T}(Y_t - T_t)^2 + \lambda \sum_{t=2}^{T-1}[(T_{t+1} - T_t) - (T_t - T_{t-1})]^2$

First term: Cycle should be small
Second term: Trend should be smooth
$\lambda$ controls tradeoff (typically 1600 for quarterly data)

Baxter-King filter: Band-pass filter isolating specific frequencies.

Seasonal Adjustment

Why adjust?: Many series have strong seasonal patterns (retail sales in December, unemployment in summer). Seasonally adjusted data remove predictable calendar effects.

Methods:

Classical decomposition: Moving average for trend, then seasonal indices
X-13 ARIMA-SEATS: Census Bureau method, sophisticated model-based adjustment
STL decomposition: Locally weighted regression (loess) for trend and season

Worked Example: Retail Sales Decomposition

Data: Monthly U.S. retail sales, 2015-2023

Decomposition reveals:

Trend: Steady increase, accelerating post-2020
Seasonal: December peak (~15% above trend), January trough
Cycle: COVID shock (March-April 2020), recovery boom
Irregular: Month-to-month noise

Interpretation: The December spike is seasonal (Christmas shopping); the 2020 crash and boom are cyclical (COVID); the upward drift is trend (economic growth and inflation).

7.3 Stationarity

Why Stationarity Matters

Definition 7.1 (Weak Stationarity): A time series $\{Y_t\}$ is weakly stationary if:
$E[Y_t] = \mu$ (constant mean)
$Var(Y_t) = \sigma^2 < \infty$ (constant, finite variance)
$Cov(Y_t, Y_{t-k}) = \gamma_k$ (covariance depends only on lag, not time)

Why it matters:

Standard regression assumes fixed distributions; stationarity provides this for time series
Non-stationary series violate regression assumptions
Two independent random walks appear correlated (spurious regression)
Forecasting requires the future to resemble the past (stationarity makes this sensible)

Stationary vs. Non-Stationary

Stationary examples:

Interest rate spreads (tend to revert to mean)
Inflation (in stable policy regimes)
Detrended GDP

Non-stationary examples:

GDP levels (trending upward)
Stock prices (random walk)
Population (growing)

The Random Walk

The random walk is the canonical non-stationary process: $Y_t = Y_{t-1} + \varepsilon_t$

where $\varepsilon_t$ is white noise.

Properties:

$E[Y_t] = Y_0$ (constant mean—but what is the "mean" of a random walk?)
$Var(Y_t) = t\sigma^2_\varepsilon$ (variance grows without bound)
No mean reversion; shocks permanent

Random walk with drift: $Y_t = \mu + Y_{t-1} + \varepsilon_t$

Has a stochastic trend plus a deterministic drift.

Unit Root Tests

Testing whether a series is stationary:

Augmented Dickey-Fuller (ADF) test: $\Delta Y_t = \alpha + \gamma Y_{t-1} + \sum_{j=1}^{p} \delta_j \Delta Y_{t-j} + \varepsilon_t$

$H_0: \gamma = 0$ (unit root, non-stationary) $H_1: \gamma < 0$ (stationary)

Critical values are non-standard (not normal distribution); use Dickey-Fuller tables.

Phillips-Perron test: Non-parametric correction for serial correlation.

KPSS test: Null is stationarity (reversed from ADF).

Worked Example: Testing for Unit Root

Series: U.S. quarterly real GDP, 1960-2020

ADF test on log GDP levels:

Test statistic: -1.24
5% critical value: -2.88
Fail to reject unit root

ADF test on log GDP growth (first difference):

Test statistic: -6.35
5% critical value: -2.88
Strongly reject unit root

Conclusion: Log GDP has a unit root (is I(1)); GDP growth is stationary (I(0)).

Spurious Regression

Granger and Newbold (1974): Two independent random walks show "significant" correlation.

Simulation: Generate two independent random walks, each of length 100:

Regress $Y_t$ on $X_t$
t-statistic often exceeds 2 (appears "significant")
$R^2$ can be 0.3-0.5

The problem: Standard errors assume stationarity; with unit roots, they're wrong.

Solution:

Difference the series if they're I(1)
Use cointegration if series share a common trend

7.4 Cointegration

The Concept

If $Y_t$ and $X_t$ are both I(1), but some linear combination $Y_t - \beta X_t$ is I(0), they are cointegrated.

Definition 7.2 (Cointegration): Variables are cointegrated if they are individually non-stationary (I(1)) but a linear combination is stationary (I(0)).

Interpretation: Variables share a common stochastic trend; they move together in the long run.

Economic Examples

Consumption and income: Both are I(1), but the consumption-income ratio tends to be stable (cointegrated).

Spot and futures prices: Both are I(1), but the basis (difference) is stationary (arbitrage keeps them together).

Short and long interest rates: Both are I(1), but the yield spread tends to be stationary.

Error Correction Model

If $Y_t$ and $X_t$ are cointegrated with cointegrating vector $(1, -\beta)$ , the error correction model is:

$\Delta Y_t = \alpha (Y_{t-1} - \beta X_{t-1}) + \gamma \Delta X_t + \varepsilon_t$

where:

$(Y_{t-1} - \beta X_{t-1})$ is the error correction term (deviation from long-run equilibrium)
$\alpha < 0$ implies adjustment back toward equilibrium
$\gamma$ captures short-run dynamics

Testing for Cointegration

Engle-Granger two-step:

Regress $Y_t$ on $X_t$ by OLS; save residuals $\hat{u}_t$
Test $\hat{u}_t$ for unit root (ADF test with modified critical values)

Johansen test: System-based test that can identify multiple cointegrating relationships.

7.5 ARIMA Models

The Building Blocks

AR(p) - Autoregressive: Current value depends on past values $Y_t = c + \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + ... + \phi_p Y_{t-p} + \varepsilon_t$

MA(q) - Moving Average: Current value depends on past shocks $Y_t = \mu + \varepsilon_t + \theta_1 \varepsilon_{t-1} + ... + \theta_q \varepsilon_{t-q}$

ARMA(p,q) - Combines both: $Y_t = c + \sum_{i=1}^{p} \phi_i Y_{t-i} + \varepsilon_t + \sum_{j=1}^{q} \theta_j \varepsilon_{t-j}$

ARIMA(p,d,q) - Integrates (differences) first:

Apply model to $\Delta^d Y_t$
d = 1 for most I(1) series

The Box-Jenkins Methodology

Step 1: Identification

Plot the series; assess stationarity
If non-stationary, difference
Examine ACF and PACF to choose p and q

Model

ACF

PACF

AR(p)

Decays

Cuts off at p

MA(q)

Cuts off at q

Decays

ARMA

Decays

Step 2: Estimation

Estimate by maximum likelihood
Standard software handles this

Step 3: Diagnosis

Check residuals for white noise (Ljung-Box test)
Check for remaining autocorrelation
If problems, return to identification

Worked Example: Inflation Forecasting

Data: U.S. monthly CPI inflation, 1990-2020

Identification:

Series appears stationary (no differencing needed)
ACF decays slowly
PACF cuts off after lag 2
Suggests AR(2)

Estimation (AR(2)): $\pi_t = 0.002 + 0.45 \pi_{t-1} + 0.25 \pi_{t-2} + \varepsilon_t$

Diagnosis:

Ljung-Box test: No remaining autocorrelation (p = 0.35)
Residuals approximately white noise

Interpretation:

Inflation is persistent (coefficients positive)
Current inflation depends on last two months
70% of a shock persists after one month; about 50% after two months

Model Selection

Information criteria: $AIC = -2\ell + 2k$ $BIC = -2\ell + k\log(T)$

where $\ell$ is log-likelihood and $k$ is number of parameters.

Lower is better
BIC penalizes complexity more; tends to choose simpler models

Modeling Time-Varying Volatility: ARCH and GARCH

ARIMA models assume constant variance (homoskedasticity). But financial returns exhibit volatility clustering: large changes tend to follow large changes, and small changes follow small changes. The variance itself evolves over time.

Box: ARCH and GARCH Models
ARCH (Autoregressive Conditional Heteroskedasticity), introduced by Engle (1982), models variance as depending on past squared residuals:
$y_t = \mu + \varepsilon_t, \quad \varepsilon_t = \sigma_t z_t, \quad z_t \sim N(0,1)$ $\sigma_t^2 = \omega + \alpha_1 \varepsilon_{t-1}^2 + \cdots + \alpha_q \varepsilon_{t-q}^2$
Large shocks ( $\varepsilon_{t-1}^2$ high) increase conditional variance $\sigma_t^2$ , which then decays over time.
GARCH (Generalized ARCH), introduced by Bollerslev (1986), adds lagged variance terms for more parsimonious modeling:
$\sigma_t^2 = \omega + \alpha \varepsilon_{t-1}^2 + \beta \sigma_{t-1}^2$
GARCH(1,1) captures volatility persistence: today's variance depends on yesterday's shock ( $\alpha$ ) and yesterday's variance ( $\beta$ ). The persistence $\alpha + \beta$ determines how quickly volatility decays.
Why it matters:
Finance: Risk management, option pricing, Value-at-Risk
Macroeconomics: Uncertainty shocks, time-varying risk premia
General econometrics: Correct standard errors when variance is non-constant
Extensions: EGARCH (asymmetric effects—bad news increases volatility more than good news), GJR-GARCH, multivariate GARCH for portfolio modeling.
Implementation: R packages rugarch, rmgarch; Python arch package.

7.6 Vector Autoregressions (VARs)

From Univariate to Multivariate

ARIMA models treat each series in isolation. But economic variables interact: output affects unemployment, inflation affects interest rates, exchange rates affect trade. Vector Autoregressions (VARs) capture these dynamic interdependencies.

A VAR treats all variables as endogenous and models them as depending on their own lags and lags of other variables in the system.

The VAR(p) Model

For a vector of $n$ variables $Y_t = (y_{1t}, y_{2t}, \ldots, y_{nt})'$ :

$Y_t = c + A_1 Y_{t-1} + A_2 Y_{t-2} + \cdots + A_p Y_{t-p} + \varepsilon_t$

where:

$c$ = $n \times 1$ vector of constants
$A_j$ = $n \times n$ matrices of coefficients
$\varepsilon_t$ = $n \times 1$ vector of error terms with $E[\varepsilon_t] = 0$ , $E[\varepsilon_t \varepsilon_t'] = \Sigma$

Example: A two-variable VAR(1) for output growth ( $\Delta y$ ) and inflation ( $\pi$ ):

$\begin{pmatrix} \Delta y_t \\ \pi_t \end{pmatrix} = \begin{pmatrix} c_1 \\ c_2 \end{pmatrix} + \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix} \begin{pmatrix} \Delta y_{t-1} \\ \pi_{t-1} \end{pmatrix} + \begin{pmatrix} \varepsilon_{1t} \\ \varepsilon_{2t} \end{pmatrix}$

The coefficient $a_{12}$ captures how last period's inflation affects current output growth; $a_{21}$ captures how last period's output growth affects current inflation.

Estimation

Each equation in a VAR can be estimated by OLS—this is efficient because all equations have the same right-hand-side variables.

library(vars)
# Estimate VAR with automatic lag selection
var_model <- VAR(data, type = "const", lag.max = 8, ic = "AIC")
summary(var_model)

Lag selection: Use information criteria (AIC, BIC) or test-based procedures. With monthly data, 12 lags is common; with quarterly data, 4-8 lags.

Granger Causality

Does variable $X$ help predict variable $Y$ beyond $Y$ 's own history?

Definition: Granger Causality
$X$ Granger-causes $Y$ if past values of $X$ contain information useful for predicting $Y$ beyond what is contained in past values of $Y$ alone.

This is tested by checking whether the coefficients on lagged $X$ in the $Y$ equation are jointly zero.

Warning: Granger causality is about predictive content, not causality in the treatment effect sense. A better name would be "Granger predictability." See Chapter 16 for genuine causal inference with time series.

Impulse Response Functions (IRFs)

The key tool for interpreting VARs. An impulse response function traces out how a shock to one variable propagates through the system over time.

The identification problem: VAR residuals are contemporaneously correlated. If both $\varepsilon_{1t}$ and $\varepsilon_{2t}$ occur together, which caused which?

Cholesky decomposition: The simplest solution orders variables and assumes shocks flow recursively. If output is ordered first:

A shock to output affects both output and inflation contemporaneously
A shock to inflation affects only inflation contemporaneously (not output)

The ordering matters! This is "recursive" or "triangular" identification.

Interpretation: IRFs show the dynamic path of variables following a one-standard-deviation shock. Error bands (typically bootstrapped) show uncertainty.

# Compute IRFs
irf_output <- irf(var_model, impulse = "output", response = "inflation",
                   n.ahead = 20, boot = TRUE)
plot(irf_output)

Forecast Error Variance Decomposition

Variance decomposition answers: What fraction of the forecast error variance in variable $Y$ is attributable to shocks in variable $X$ ?

This reveals which shocks are the main drivers of each variable's fluctuations.

Example finding: "At a 2-year horizon, 60% of output forecast error variance is due to output shocks, 25% to monetary shocks, 15% to supply shocks."

Structural VARs

Reduced-form VARs are useful for forecasting but agnostic about causality. Structural VARs (SVARs) impose economic restrictions to identify causal shocks.

Identification strategies include:

Short-run restrictions (Cholesky, contemporaneous zeros)
Long-run restrictions (e.g., money has no long-run effect on output)
Sign restrictions (e.g., demand shocks raise both output and prices)
External instruments (narrative shocks—see Chapter 16)

The frontier of SVAR identification is covered in Chapter 16 (Time Series Causal Inference), where we discuss how to make causal—not just predictive—claims from time series data.

Practical Guidance for VARs

Issue

Guidance

How many variables?

3-7 is typical; more variables means more parameters

Stationarity

Difference non-stationary variables, or use levels if cointegrated (VECM)

Lag length

Use information criteria; check residual autocorrelation

Ordering for Cholesky

Think about economic timing: slower-moving variables first

Interpretation

Focus on IRFs and variance decomposition, not individual coefficients

7.7 State Space Models and the Kalman Filter

The State Space Framework

Many time series models can be written in state space form:

Measurement equation: $Y_t = Z_t \alpha_t + \varepsilon_t$ Transition equation: $\alpha_{t+1} = T_t \alpha_t + R_t \eta_t$

where:

$Y_t$ = observed series
$\alpha_t$ = unobserved state
$\varepsilon_t$ , $\eta_t$ = error terms

Examples:

ARIMA models
Dynamic factor models
Unobserved components (trend + cycle)
Time-varying parameter models

The Kalman Filter

The Kalman filter recursively estimates the unobserved state:

Prediction: $\alpha_{t|t-1} = T_{t-1}\alpha_{t-1|t-1}$

Updating: $\alpha_{t|t} = \alpha_{t|t-1} + K_t(Y_t - Z_t\alpha_{t|t-1})$

where $K_t$ is the Kalman gain, weighting new information.

What it provides:

Filtered estimates (best estimate at time t using data through t)
Smoothed estimates (best estimate at time t using all data)
Forecast distributions

Applications

Missing data: Kalman filter handles missing observations naturally.

Mixed frequency: Combine monthly and quarterly data.

Unobserved components: Estimate separate trend and cycle.

Real-time estimation: Update estimates as new data arrive.

7.8 Forecasting

The Forecasting Problem

Goal: Predict $Y_{T+h}$ given information through time $T$ .

Point forecast: $\hat{Y}_{T+h|T} = E[Y_{T+h} | Y_T, Y_{T-1}, ...]$

Interval forecast: Range containing $Y_{T+h}$ with specified probability.

Forecast Evaluation

Loss functions:

Mean Squared Error: $MSE = \frac{1}{H}\sum_{h=1}^{H}(Y_{T+h} - \hat{Y}_{T+h|T})^2$
Mean Absolute Error: $MAE = \frac{1}{H}\sum_{h=1}^{H}|Y_{T+h} - \hat{Y}_{T+h|T}|$
Mean Absolute Percentage Error: $MAPE = \frac{100}{H}\sum_{h=1}^{H}\frac{|Y_{T+h} - \hat{Y}_{T+h|T}|}{|Y_{T+h}|}$

Out-of-sample evaluation:

Split sample: Estimate on early data, evaluate on later data
Rolling window: Re-estimate as new data arrive
Expanding window: Add data without dropping old

Forecast Comparison

Diebold-Mariano test: Test whether two forecasts have equal accuracy.

$H_0$ : $E[L(e_{1t})] = E[L(e_{2t})]$ (equal loss)

where $L$ is the loss function and $e_{it}$ are forecast errors.

Combining Forecasts

Bates and Granger (1969): Combined forecasts often outperform individual forecasts.

Simple average: Often hard to beat

Weighted average: Weights based on past performance

Why combining works: Different models capture different aspects of the data; combination diversifies across model uncertainty.

Forecast Uncertainty

Fan charts: Show distribution of forecasts at each horizon

Density forecasts: Full predictive distribution, not just point forecast

Honest uncertainty: Forecast intervals should cover the true value at the stated rate (calibration).

Practical Guidance

Workflow for Time Series Analysis

Plot the data; identify obvious features
Test for stationarity (ADF, KPSS)
Transform if needed (difference, log)
Model (ARIMA, state space)
Diagnose (residual tests)
Forecast (point and interval)
Evaluate (out-of-sample)

Common Pitfalls

Pitfall 1: Regressing non-stationary series Regression of I(1) on I(1) series produces spurious results.
How to avoid: Test for stationarity; difference or use cointegration.

Pitfall 2: Over-differencing If a series is already stationary, differencing induces a unit root.
How to avoid: Test before differencing; check both ADF and KPSS.

Pitfall 3: Overfitting Too many lags or parameters improve in-sample fit but hurt forecasts.
How to avoid: Use information criteria; evaluate out-of-sample.

Pitfall 4: Ignoring structural breaks Models estimated on pre-break data don't forecast post-break.
How to avoid: Test for breaks; use rolling windows; acknowledge uncertainty.

Box: Structural Breaks and Their Consequences
A structural break occurs when the data-generating process changes at some point in time. This can affect means, variances, persistence, or relationships between variables.
Why structural breaks matter:
Unit root tests are biased: The ADF test cannot distinguish between a true unit root and a stationary series with a structural break. A break in mean looks like non-stationarity, biasing tests toward failing to reject the unit root null.
Forecasts fail at breaks: A model estimated on pre-break data will systematically mis-forecast after a break. The "Great Moderation" (1984-2007 decline in volatility) led to models that vastly underestimated post-2008 crisis volatility.
Inference is invalid: Standard errors assume parameter stability. Pooling across regimes produces meaningless "average" estimates.
Common sources of breaks:
Policy regime changes (Volcker disinflation 1979, inflation targeting adoption)
Financial crises (2008, COVID-19)
Institutional changes (currency pegs, trade liberalization)
Technological shifts (dot-com era, AI adoption)
Detection methods:
Chow test: Tests whether coefficients differ before/after a known break date
CUSUM/CUSUMSQ: Recursive residual-based tests for parameter instability
Bai-Perron (1998, 2003): Estimates multiple unknown break dates
Quandt-Andrews: Sup-Wald test over all possible break points
Practical responses:
Test for breaks before assuming stationarity
If breaks exist: (a) estimate separate models for each regime, (b) use dummy variables for level/trend shifts, or (c) allow time-varying parameters
Use rolling or expanding window estimation to detect parameter drift
Be humble about forecasts spanning potential break points
Example: U.S. inflation persistence appears to have declined after the early 1980s. An AR model estimated on 1960-1980 data would dramatically overpredict inflation persistence in later decades.

Rules of Thumb

Test for unit roots before regression
Difference I(1) series (or use cointegration)
Evaluate forecasts out-of-sample, not in-sample
Start simple: Random walk, AR(1), AR(2) often work well
Report uncertainty: Intervals, fan charts, not just point forecasts

Running Example: Business Cycles and Monetary Policy

U.S. Business Cycles

Data: Quarterly real GDP, 1947-2023

Characteristics:

Positive trend (economic growth)
Irregular cycles around trend (expansions and recessions)
Occasional sharp drops (2008, 2020)

HP filter decomposition:

Trend grows ~2-3% per year
Cycle fluctuates ±2-3% around trend
Recessions visible as negative cycle

AR(2) for output gap (HP-filtered cycle): $gap_t = 0.72 \cdot gap_{t-1} + 0.14 \cdot gap_{t-2} + \varepsilon_t$

Highly persistent (sum of AR coefficients near 0.86)
Cycles last ~16 quarters on average

Forecasting Inflation

Question: Can we forecast inflation? How accurate are forecasts?

Models compared:

Random walk: $\hat{\pi}_{t+h} = \pi_t$
AR(2)
Phillips curve: AR with unemployment gap
Survey expectations

Results (out-of-sample RMSE, 1990-2020):

Random walk: 0.45
AR(2): 0.42
Phillips curve: 0.44
Survey: 0.38

Interpretation: Inflation is hard to forecast; simple models do nearly as well as complex ones; survey expectations contain information models miss.

This descriptive analysis sets up Chapter 16's examination of causal questions about monetary policy.

Integration Note

Connections to Other Methods

Method

Relationship

See Chapter

Regression

Time series regression has special issues

Ch. 3

Causal Inference

TS causal methods build on these foundations

Ch. 16

Panel Data

Panel methods combine TS and cross-section

Ch. 13, 15

Forecasting

Distinct from causal inference

This chapter

Triangulation Strategies

Time series descriptions gain credibility when:

Multiple methods agree: ARIMA and state space give similar patterns
Robustness to specification: Results stable to lag length, filter parameter
Economic interpretation: Patterns align with economic events and theory
Out-of-sample validation: Models forecast well on new data
Comparison to survey data: Model-based and survey expectations align

Summary

Key takeaways:

Time series data have special structure: Temporal dependence requires methods that account for the ordering of observations.
Decomposition separates trend, season, and cycle: Understanding components is essential before modeling.
Stationarity is fundamental: Non-stationary series require transformation (differencing, detrending) before analysis.
Spurious regression is real: Two independent random walks appear correlated. Always test for unit roots.
ARIMA models provide a flexible framework for modeling stationary series after differencing.
Forecasting requires out-of-sample evaluation: In-sample fit is misleading; simple models often beat complex ones.

Returning to the opening question: Time ordering creates patterns (trends, cycles, persistence) that cross-sectional data lack—and problems (non-stationarity, spurious correlation) that require specialized tools. Understanding time series foundations is essential for anyone working with temporally ordered data, whether for description, forecasting, or causal inference.

Exercises

Conceptual

Explain the difference between a trend-stationary and a difference-stationary process. How would you distinguish them empirically?
Why does regressing one random walk on another produce spurious results? What happens to the standard errors?
What is cointegration? Give an economic example and explain why the series might be cointegrated.

Applied

Using monthly data on an economic variable of your choice:
- Plot the series and describe what you see
- Test for stationarity (ADF and KPSS)
- Estimate an appropriate ARIMA model
- Produce forecasts and evaluate out-of-sample
Using quarterly GDP data:
- Apply the HP filter to extract the cycle
- Estimate an AR model for the output gap
- Calculate the average duration of business cycles implied by your model

Discussion

A financial analyst says: "My trading model has an R² of 0.85 on historical data, so I'm confident in my forecasts." What questions would you ask?

Technical Appendix

A. Wold Decomposition

Any covariance stationary process can be written: $Y_t = \mu + \sum_{j=0}^{\infty} \psi_j \varepsilon_{t-j}$

where $\sum \psi_j^2 < \infty$ and $\varepsilon_t$ is white noise.

This justifies MA representations and is the foundation for impulse response analysis.

B. Stationarity Conditions for AR(p)

The AR(p) process is stationary if and only if all roots of the characteristic polynomial: $1 - \phi_1 z - \phi_2 z^2 - ... - \phi_p z^p = 0$

lie outside the unit circle.

For AR(1): $|\phi_1| < 1$ For AR(2): $\phi_1 + \phi_2 < 1$ , $\phi_2 - \phi_1 < 1$ , $|\phi_2| < 1$

C. Kalman Filter Equations

Prediction:

$\alpha_{t|t-1} = T \alpha_{t-1|t-1}$
$P_{t|t-1} = T P_{t-1|t-1} T' + R Q R'$

Update:

$K_t = P_{t|t-1} Z'(Z P_{t|t-1} Z' + H)^{-1}$
$\alpha_{t|t} = \alpha_{t|t-1} + K_t(Y_t - Z \alpha_{t|t-1})$
$P_{t|t} = (I - K_t Z) P_{t|t-1}$

where $Q = Var(\eta_t)$ and $H = Var(\varepsilon_t)$ .

PreviousChapter 6: Describing Patterns in Data NextChapter 8: Programming Companion—Data Exploration

Last updated 1 month ago

hashtagOpening Question

hashtagChapter Overview

hashtagHistorical Context: The Science of Prediction

hashtag7.1 Working with Time-Indexed Data

hashtagData Structures

hashtagFrequency and Aggregation

hashtagTime Series Plots

hashtag7.2 Decomposition

hashtagThe Components

hashtagTrend Extraction

hashtagSeasonal Adjustment

hashtagWorked Example: Retail Sales Decomposition

hashtag7.3 Stationarity

hashtagWhy Stationarity Matters

hashtagStationary vs. Non-Stationary

hashtagThe Random Walk

hashtagUnit Root Tests

hashtagWorked Example: Testing for Unit Root

hashtagSpurious Regression

hashtag7.4 Cointegration

hashtagThe Concept

hashtagEconomic Examples

hashtagError Correction Model

hashtagTesting for Cointegration

hashtag7.5 ARIMA Models

hashtagThe Building Blocks

hashtagThe Box-Jenkins Methodology

hashtagWorked Example: Inflation Forecasting

hashtagModel Selection

hashtagModeling Time-Varying Volatility: ARCH and GARCH

hashtag7.6 Vector Autoregressions (VARs)

hashtagFrom Univariate to Multivariate

hashtagThe VAR(p) Model

hashtagEstimation

hashtagGranger Causality

hashtagImpulse Response Functions (IRFs)

hashtagForecast Error Variance Decomposition

hashtagStructural VARs

hashtagPractical Guidance for VARs

hashtag7.7 State Space Models and the Kalman Filter

hashtagThe State Space Framework

hashtagThe Kalman Filter

hashtagApplications

hashtag7.8 Forecasting

hashtagThe Forecasting Problem

hashtagForecast Evaluation

hashtagForecast Comparison

hashtagCombining Forecasts

hashtagForecast Uncertainty

hashtagPractical Guidance

hashtagWorkflow for Time Series Analysis

hashtagCommon Pitfalls

hashtagRules of Thumb

hashtagRunning Example: Business Cycles and Monetary Policy

hashtagU.S. Business Cycles

hashtagForecasting Inflation

hashtagIntegration Note

hashtagConnections to Other Methods

hashtagTriangulation Strategies

hashtagSummary

hashtagFurther Reading

hashtagEssential

hashtagFor Deeper Understanding

hashtagHistorical/Methodological

hashtagApplications

hashtagExercises

hashtagConceptual

hashtagApplied

hashtagDiscussion

hashtagTechnical Appendix

hashtagA. Wold Decomposition

hashtagB. Stationarity Conditions for AR(p)

hashtagC. Kalman Filter Equations

Opening Question

Chapter Overview

Historical Context: The Science of Prediction

7.1 Working with Time-Indexed Data

Data Structures

Frequency and Aggregation

Time Series Plots

7.2 Decomposition

The Components

Trend Extraction

Seasonal Adjustment

Worked Example: Retail Sales Decomposition

7.3 Stationarity

Why Stationarity Matters

Stationary vs. Non-Stationary

The Random Walk

Unit Root Tests

Worked Example: Testing for Unit Root

Spurious Regression

7.4 Cointegration

The Concept

Economic Examples

Error Correction Model

Testing for Cointegration

7.5 ARIMA Models

The Building Blocks

The Box-Jenkins Methodology

Worked Example: Inflation Forecasting

Model Selection

Modeling Time-Varying Volatility: ARCH and GARCH

7.6 Vector Autoregressions (VARs)

From Univariate to Multivariate

The VAR(p) Model

Estimation

Granger Causality

Impulse Response Functions (IRFs)

Forecast Error Variance Decomposition

Structural VARs

Practical Guidance for VARs

7.7 State Space Models and the Kalman Filter

The State Space Framework

The Kalman Filter

Applications

7.8 Forecasting

The Forecasting Problem

Forecast Evaluation

Forecast Comparison

Combining Forecasts

Forecast Uncertainty

Practical Guidance

Workflow for Time Series Analysis

Common Pitfalls

Rules of Thumb

Running Example: Business Cycles and Monetary Policy

U.S. Business Cycles

Forecasting Inflation

Integration Note

Connections to Other Methods

Triangulation Strategies

Summary

Further Reading

Essential

For Deeper Understanding

Historical/Methodological

Applications

Exercises

Conceptual

Applied

Discussion

Technical Appendix

A. Wold Decomposition

B. Stationarity Conditions for AR(p)

C. Kalman Filter Equations