PastPaperHero | Time-series analysis - Unit roots cointegration and error-correction models

Learning Outcomes

This article explains how to handle nonstationary time-series data in an exam setting, emphasizing the distinction between covariance-stationary, trend-stationary, and unit-root processes, unit roots, cointegration, and error-correction models. It clarifies when standard regressions in levels are likely to be spurious and how appropriate transformations or cointegration methods restore valid inference. It shows how to identify unit roots, specify and interpret the Dickey–Fuller test, and decide whether a series should be modeled in levels or first differences. The article also covers practical choices around first differencing and highlights common pitfalls such as misusing standard critical values for unit-root and cointegration tests. In addition, it discusses cointegration, the Engle–Granger procedure for testing long-run relationships, and the interpretation of stationary residuals from a cointegrating regression. Finally, it presents the structure and intuition of error-correction models, focusing on how the error-correction term and speed-of-adjustment coefficient capture short-run adjustments toward long-run equilibrium in applied and exam-style questions.

CFA Level 2 Syllabus

For the CFA Level 2 exam, you are required to understand time-series analysis with nonstationary data, with a focus on the following syllabus points:

Explaining the concept of a unit root and its impact on time-series analysis
Describing and applying the Dickey–Fuller test to detect nonstationarity
Demonstrating transformation of series with unit roots using first differencing
Explaining the process to test for and establish cointegration
Interpreting error-correction models (ECMs) and their use in modeling relationships between cointegrated series
Discussing implications for regression analysis when time-series variables are nonstationary and/or cointegrated

Test Your Knowledge

Attempt these questions before reading this article. If you find some difficult or cannot remember the answers, remember to look more closely at that area during your revision.

A quantitative analyst is examining quarterly data on a stock index level ( $P_t$ ), aggregate earnings per share ( $EPS_t$ ), and the three‑month Treasury-bill rate ( $R_t$ ) from 2000–2024. She estimates AR(1) models and runs Dickey–Fuller tests on each level series, then applies Engle–Granger cointegration tests.

Summary of her results:

For $P_t$ and $EPS_t$ , the DF test statistics on the level series are less negative than the 5% critical values; on the first differences, DF statistics are more negative than the critical values.
For $R_t$ , the DF test on the level series yields a statistic more negative than the 5% critical value.
A regression of $\ln(EPS_t)$ on $\ln(P_t)$ produces highly significant coefficients and $R^2 = 0.96$ . A DF test applied to the residuals rejects the null of a unit root at 5%.
A regression of $\Delta \ln(P_t)$ on $\Delta R_t$ yields $R^2 = 0.05$ and no significant coefficients.

Use this information to answer Questions 1–4.

Based on the unit-root tests, which classification of processes is most appropriate?
1. PtP_tPt and EPStEPS_tEPSt are I(0); RtR_tRt is I(1).
2. PtP_tPt and EPStEPS_tEPSt are I(1); RtR_tRt is I(0).
3. All three series are I(1).
4. All three series are I(0).
The regression of ln⁡(EPSt)\ln(EPS_t)ln(EPSt) on ln⁡(Pt)\ln(P_t)ln(Pt) is best interpreted as:
1. A spurious regression because both variables are nonstationary in levels.
2. A valid long-run cointegrating relationship between earnings and prices.
3. A short-run relationship that should be estimated in first differences.
4. An AR(1) model because the dependent variable is lagged once.
Given the test outcomes, which modeling approach is most appropriate for the relationship between PtP_tPt and EPStEPS_tEPSt?
1. Regress Δln⁡(Pt)\Delta \ln(P_t)Δln(Pt) on Δln⁡(EPSt)\Delta \ln(EPS_t)Δln(EPSt) only.
2. Regress ln⁡(Pt)\ln(P_t)ln(Pt) on ln⁡(EPSt)\ln(EPS_t)ln(EPSt) in levels with usual t‑tests.
3. Estimate an error-correction model with Δln⁡(Pt)\Delta \ln(P_t)Δln(Pt) as the dependent variable.
4. Use a pure random-walk model for ln⁡(Pt)\ln(P_t)ln(Pt) with no role for EPStEPS_tEPSt.
Regarding the regression of Δln⁡(Pt)\Delta \ln(P_t)Δln(Pt) on ΔRt\Delta R_tΔRt, which statement is most accurate?
1. The low R2R^2R2 proves the relationship is spurious due to nonstationarity.
2. The regression is likely valid because both variables are stationary and show no relationship.
3. The regression should be re-estimated in levels to increase explanatory power.
4. The presence of cointegration between PtP_tPt and EPStEPS_tEPSt implies that RtR_tRt must also be cointegrated.

Introduction

Analysis of time-series data in finance frequently uncovers patterns such as persistence, trends, or seasonal fluctuations. However, not all time-series data are suitable for standard regression analysis; the property of stationarity is critical for reliable statistical inference. Economic and financial data often exhibit nonstationarity because of trends, structural changes, or random walk behavior. This article examines unit roots (a primary cause of nonstationarity), details statistical methods for testing and correcting for nonstationarity, and introduces cointegration and error-correction models as solutions for valid inference between related nonstationary variables.

Key Term: Unit Root
A characteristic of a time series where the coefficient on the lagged dependent variable equals one (in an AR(1) model), making the series nonstationary and causing its variance to increase without bound over time.

Key Term: Covariance Stationary
A time series whose mean, variance, and autocovariances are constant over time. Such series are stable around a fixed mean and are suitable for standard statistical analysis and AR modeling.

Covariance Stationarity and AR(1) Models

Many time-series models used in the curriculum are autoregressive (AR). The simplest is the AR(1):

x_t = b_0 + b_1 x_{t-1} + \varepsilon_t

where $\varepsilon_t$ is white noise. For this process to be covariance stationary:

The expected value $E(x_t)$ must be constant and finite.
The variance $\operatorname{Var}(x_t)$ must be constant and finite.
The covariance $\operatorname{Cov}(x_t, x_{t-k})$ for any lag $k$ must depend only on $k$ , not on $t$ .

If $\lvert b_1 \rvert < 1$ , the series is mean-reverting and covariance stationary. The long-run mean (often called the mean-reverting level) is:

\mu = \frac{b_0}{1 - b_1}

Key Term: Mean-Reverting Level
The long-run equilibrium value to which a stationary AR(1) process tends, calculated as $b_0 / (1 - b_1)$ when $\lvert b_1 \rvert < 1$ .

If $b_1 = 1$ , the mean-reverting level is undefined (division by zero), and the process becomes a random walk with a unit root, which is not covariance stationary.

Random Walks and Unit Roots

A commonly observed random walk process can be written as:

x_t = x_{t-1} + \varepsilon_t

where $\varepsilon_t$ is white noise. Here, the coefficient on $x_{t-1}$ is 1, indicating a unit root.

Key Term: Random Walk
A process where the best forecast of the next value is the current value plus a purely random shock: $x_t = x_{t-1} + \varepsilon_t$ . It has a unit root, is nonstationary, and does not mean-revert.

Sometimes there is a deterministic drift:

x_t = b_0 + x_{t-1} + \varepsilon_t

Key Term: Random Walk with Drift
A random walk process with a nonzero intercept $b_0$ , so the expected change each period is $b_0$ . The series trends upward or downward on average but remains nonstationary.

In both cases (with or without drift), $b_1 = 1$ implies a unit root, and the variance of $x_t$ grows with $t$ . Least squares regression on such series in levels can be misleading unless we transform the data or exploit cointegration.

Key Term: Spurious Regression
Incorrect statistical inference where regression of two unrelated nonstationary time series indicates a significant relationship—high $R^2$ and t‑statistics—even though the variables are unrelated in economic terms. This arises because both series trend or wander over time.

Why Stationarity Matters

A covariance stationary process has a stable mean and variance, so its statistical properties do not change over time. Most estimation techniques—including hypothesis tests—implicitly assume stationarity. If that assumption is violated:

Estimated coefficients may appear statistically significant when the variables are not truly related.
Standard errors and test statistics follow different distributions than assumed.
Forecasts may be systematically biased or unstable.

When a time series has a unit root, its statistical properties change over time, resulting in spurious relationships and unreliable regression results. For example, regressing the level of a stock index on the level of an unrelated macro variable (both following random walks) can yield a high $R^2$ even though there is no economic link.

From an exam standpoint, whenever you see trending or random-walk-like series in levels, you should immediately think about unit-root testing and the risk of spurious regression.

Recognizing and Testing for Unit Roots

Consider the AR(1) model:

x_t = b_0 + b_1 x_{t-1} + \varepsilon_t

If $\lvert b_1 \rvert < 1$ , the process is stationary and mean-reverting.
If $b_1 = 1$ , the process is a random walk with a unit root.
If $b_1$ is close to 1, the series is highly persistent, and unit-root behavior is plausible.

In practice, we do not know $b_1$ , so we estimate it and test for a unit root.

Testing for a Unit Root: Dickey–Fuller Test

The Dickey–Fuller (DF) test is the standard approach for diagnosing a unit root in an AR(1) setting. Start from:

x_t = b_0 + b_1 x_{t-1} + \varepsilon_t

Subtract $x_{t-1}$ from both sides:

\Delta x_t = x_t - x_{t-1} = b_0 + (b_1 - 1)x_{t-1} + \varepsilon_t

Define $\beta = b_1 - 1$ :

\Delta x_t = \alpha + \beta x_{t-1} + \varepsilon_t

The null and alternative hypotheses are:

$H_0$ : $\beta = 0$ (equivalently $b_1 = 1$ ) → series has a unit root (nonstationary).
$H_1$ : $\beta < 0$ (equivalently $b_1 < 1$ ) → series is stationary.

The testing steps are:

Estimate the regression: $\Delta x_t = \alpha + \beta x_{t-1} + \varepsilon_t$
Compute the t‑statistic on $\hat{\beta}$ .
Compare the t‑statistic to Dickey–Fuller critical values (not the usual t‑distribution critical values).

If the test statistic is more negative than the DF critical value, reject $H_0$ and conclude the series is covariance stationary. If you cannot reject $H_0$ , treat the series as having a unit root (nonstationary).

Key Term: Dickey–Fuller Test
A unit-root test based on regressing the first difference of a variable on its lagged level and testing whether the coefficient on the lagged level equals zero (implying a unit root). It uses special critical values because the null is a unit root, not stationarity.

In practice, analysts often use the Augmented Dickey–Fuller (ADF) test, which adds lagged differences of $x_t$ to the right-hand side to absorb serial correlation in $\varepsilon_t$ . The intuition and exam logic are the same: test whether the series has a unit root.

Key Term: Order of Integration
The number of differences required to transform a series into a stationary process. A series that becomes stationary after first differencing is integrated of order one, denoted I(1). A stationary series in levels is I(0).

Worked Example 1.1

A macroeconomist analyzes quarterly real GDP in levels using the Dickey–Fuller regression and finds that the DF test statistic is less negative than the 5% critical value. When she applies the DF test to the first differences of GDP, she strongly rejects the null of a unit root.

Answer:
The non-significant DF statistic in levels means the null of a unit root cannot be rejected; GDP in levels behaves as an I(1) series. The significant DF statistic for $\Delta GDP$ implies the differenced series is stationary (I(0)). She should model GDP using first differences (growth rates) in AR or regression models, not the raw level series, unless she is explicitly working with cointegration.

First Differencing to Achieve Stationarity

First differencing transforms a nonstationary series with a unit root into a stationary series by computing period-to-period changes:

\Delta x_t = x_t - x_{t-1}

If $x_t$ follows a random walk:

x_t = x_{t-1} + \varepsilon_t

then:

\Delta x_t = \varepsilon_t

which is white noise and covariance stationary.

Key Term: First Difference
The change between consecutive observations in a time series, $\Delta x_t = x_t - x_{t-1}$ , often used to remove unit roots and induce stationarity so that standard AR models and regressions can be applied.

In the exam, once you conclude a series has a unit root:

Transform it by taking first differences (for I(1) series).
Model the differenced series using AR, MA, or regression techniques.
Interpret coefficients as effects on changes, not on levels. For example, a regression of $\Delta y_t$ on $\Delta x_t$ measures the short-run relationship between changes in $x$ and changes in $y$ .

Be careful not to overdifference. Differencing an already stationary series can remove meaningful long-run information and make interpretation harder. That is why unit-root testing is an essential first step.

Cointegration: Long-Run Relationships Between Nonstationary Series

Sometimes, two or more economic variables are individually nonstationary but move together in such a way that they do not drift arbitrarily far apart. For example, stock prices and dividends, consumption and income, or spot and futures prices may share a stable long-run equilibrium relationship.

Key Term: Cointegration
A property where two or more I(1) time series are linked by a linear combination that is I(0) (stationary). This indicates a stable long-run equilibrium relationship among the series.

Formally, suppose $y_t$ and $x_t$ are both I(1), but there exists a coefficient vector $(1, -\beta)$ such that:

e_t = y_t - \alpha - \beta x_t

is I(0). Then $y_t$ and $x_t$ are cointegrated, $\beta$ is part of the cointegrating vector, and $e_t$ captures deviations from long-run equilibrium.

Key Term: Cointegrating Vector
The set of coefficients that defines the stationary linear combination of cointegrated series. In a simple two-variable case, $y_t - \alpha - \beta x_t$ is stationary, and $(1, -\beta)$ is the cointegrating vector.

If cointegration is present:

Regressions between the level variables can be meaningful and represent long-run relationships.
You should not simply difference both variables and ignore the long-run equilibrium; instead, use an error-correction model to capture both short-run dynamics and adjustment back to equilibrium.

If the variables are nonstationary but not cointegrated, any regression in levels is spurious, and you should model relationships in first differences.

Testing for Cointegration: Engle–Granger Procedure

For the Level 2 exam, the key cointegration test is the Engle–Granger two-step procedure.

Key Term: Engle–Granger Test
A two-step method for testing cointegration: estimate a long-run regression in levels among nonstationary variables, then apply a unit-root test (with special critical values) to the residuals. Stationary residuals imply cointegration.

The steps are:

Test each series (e.g., $y_t$ and $x_t$ ) for unit roots using the DF/ADF test. Proceed only if both are I(1).
Estimate the long-run (cointegrating) regression in levels: $y_t = \alpha + \beta x_t + e_t$
Save the residuals $\hat{e}_t$ and test them for a unit root using a DF-type test. Importantly, the critical values are different from standard DF critical values and are typically provided in exam tables.

Interpretation:

If the residuals are stationary (reject unit root), $y_t$ and $x_t$ are cointegrated; the regression in levels is not spurious and can be used to define the long-run relationship.
If the residuals have a unit root (cannot reject), the variables are not cointegrated; regressions in levels are spurious, and you should work with differences.

Worked Example 1.2

You regress log consumption on log income using quarterly macro data:

\ln C_t = \alpha + \beta \ln Y_t + e_t

DF tests suggest both $\ln C_t$ and $\ln Y_t$ are I(1). The DF test on the residuals $e_t$ rejects the null of a unit root at the 5% level using Engle–Granger critical values.

Answer:
The residuals’ stationarity indicates that $\ln C_t$ and $\ln Y_t$ are cointegrated. Their regression in levels reflects a valid long-term economic relationship between consumption and income. You should interpret $\beta$ as the long-run elasticity of consumption with respect to income, and model short-term dynamics using an error-correction model that combines $\Delta \ln C_t$ , $\Delta \ln Y_t$ , and the lagged residual $e_{t-1}$ .

Error-Correction Models (ECMs): Linking Short- and Long-Term Behavior

Cointegrated series may deviate from equilibrium in the short term because of shocks, frictions, or adjustment delays. Error-correction models (ECMs) explicitly model:

Short-run changes in the variables (typically first differences), and
The gradual correction of deviations from long-run equilibrium.

Key Term: Error-Correction Model (ECM)
A regression model for cointegrated series that includes both short-run changes (e.g., first differences) and an error-correction term (lagged deviation from the cointegrating relationship), allowing the system to adjust gradually back to long-run equilibrium.

Starting from the cointegrating relationship:

y_t = \alpha + \beta x_t + e_t

the error-correction form is:

\Delta y_t = \gamma_0 + \gamma_1 \Delta x_t + \lambda e_{t-1} + \varepsilon_t

where:

$\Delta y_t$ and $\Delta x_t$ capture short-run changes.
$e_{t-1}$ is the lagged residual from the long-run cointegrating equation.
$\lambda$ is the error-correction (speed-of-adjustment) coefficient.

Key Term: Error-Correction Term
The lagged residual from the cointegrating regression (e.g., $e_{t-1}$ ). It measures last period’s deviation from long-run equilibrium and drives the adjustment in the dependent variable back toward equilibrium.

Key Term: Speed of Adjustment
The coefficient on the error-correction term (e.g., $\lambda$ ). It indicates the fraction of last period’s disequilibrium that is corrected in the current period; typically negative and between −1 and 0 in a stable ECM.

Interpretation of $\lambda$ :

If $\lambda = 0$ , there is no adjustment toward the long-run relationship; only short-run dynamics matter.
If $-1 < \lambda < 0$ , deviations from equilibrium are partially corrected each period. For example, $\lambda = -0.3$ means about 30% of the gap is closed per period.
If $\lambda \leq -1$ or $\lambda > 0$ , the system may be unstable or overshooting, which would be unusual in well-behaved economic applications.

Worked Example 1.3

An analyst models the relationship between money supply ( $M_t$ ) and inflation ( $\pi_t$ ) using an error-correction model (after establishing cointegration between the level series). The ECM for inflation is:

\Delta \pi_t = 0.2 + 0.4 \Delta M_t - 0.25 e_{t-1} + \varepsilon_t

where $e_{t-1}$ is last period’s residual from the long-run cointegrating regression of $\pi_t$ on $M_t$ . The coefficient on $e_{t-1}$ is statistically significant and negative.

Answer:
The significant negative coefficient on $e_{t-1}$ means that when inflation is above the long-run level implied by the money supply (positive $e_{t-1}$ ), the change in inflation $\Delta \pi_t$ tends to be lower in the next period, pulling inflation back down toward equilibrium. Specifically, with $\lambda = -0.25$ , about 25% of any deviation from the long-run relationship is corrected in each period. The 0.4 coefficient on $\Delta M_t$ captures the short-run impact of changes in money supply on changes in inflation.

Exam Warning on ECM Interpretation

In exam questions, pay close attention to:

The sign of the error-correction term’s coefficient (should be negative for a stable adjustment of $y_t$ back toward equilibrium when $e_{t-1}$ is positive).
Whether the ECM is written for $y_t$ or $x_t$ ; the adjustment speed can differ depending on which variable is the dependent one.
The distinction between short-run coefficients on differenced variables and the long-run cointegration coefficient $\beta$ .

Exam Warning: Unit-Root and Cointegration Tests

Do not use standard t‑ or normal critical values when applying Dickey–Fuller or Engle–Granger tests. Use the special critical values provided for unit-root and cointegration tests. Failing to do so can lead to incorrect conclusions about stationarity and cointegration.

Further, avoid regressing nonstationary series in levels without first:

Testing each series for unit roots, and
Either transforming them (e.g., first differencing) or verifying cointegration.

Otherwise, the regression may be spurious even if the $R^2$ is high and t‑statistics appear significant.

Nonstationarity, Cointegration, and Regression Analysis

When using time series in regressions, think in terms of the order of integration and cointegration status:

Case 1: All variables are I(0) (stationary in levels).
- Regressions in levels are valid.
- Standard inference (t‑ and F‑tests) is appropriate.
- No differencing or cointegration considerations are needed.
Case 2: Variables are I(1) but not cointegrated.
- Regressions in levels are spurious.
- Regressions should use first differences (or other transformations that yield stationarity), e.g., $\Delta y_t$ on $\Delta x_t$ .
- No error-correction term is used, because there is no long-run equilibrium linking the levels.
Case 3: Variables are I(1) and cointegrated.
- The regression in levels represents a valid long-run equilibrium relationship.
- Short-run dynamics should be modeled using an ECM, combining differenced variables and the error-correction term.
- Ignoring cointegration and working only with differences discards information about the long-run relationship.

In exam scenarios, once you identify that two nonstationary series are cointegrated, the preferred modeling approach is an ECM rather than a pure differenced regression.

Key Point Checklist

This article has covered the following key knowledge points:

The distinction between covariance-stationary (I(0)) and unit-root (I(1)) time-series processes
Why random walks (with or without drift) are nonstationary and produce spurious regression results
The structure of an AR(1) model and the role of the lag coefficient in determining mean reversion
The concept of a unit root and its implications for time-series modeling and forecasting
The Dickey–Fuller unit-root test: regression specification, hypotheses, and use of special critical values
The rationale for and application of first differencing to transform I(1) series into stationary series
The concept of order of integration and the notation I(0), I(1)
The definition of cointegration and its interpretation as a long-run equilibrium relationship between nonstationary series
The Engle–Granger two-step cointegration test using residuals from a long-run regression
The structure of an error-correction model, including short-run differenced terms and the error-correction term
Interpretation of the speed-of-adjustment coefficient in ECMs and its implications for stability
The decision rules for when regressions in levels are valid, when they are spurious, and when ECMs are appropriate
Common exam pitfalls: ignoring unit roots, misusing standard critical values, and misinterpreting ECM coefficients

Key Terms and Concepts

Unit Root
Covariance Stationary
Mean-Reverting Level
Random Walk
Random Walk with Drift
Spurious Regression
Dickey–Fuller Test
Order of Integration
First Difference
Cointegration
Cointegrating Vector
Engle–Granger Test
Error-Correction Model (ECM)
Error-Correction Term
Speed of Adjustment

Time-series analysis - Unit roots cointegration and error-co...