Welcome

Time-series analysis - Unit roots cointegration and error-co...

ResourcesTime-series analysis - Unit roots cointegration and error-co...

Learning Outcomes

This article explains how to handle nonstationary time-series data in an exam setting, emphasizing the distinction between covariance-stationary, trend-stationary, and unit-root processes, unit roots, cointegration, and error-correction models. It clarifies when standard regressions in levels are likely to be spurious and how appropriate transformations or cointegration methods restore valid inference. It shows how to identify unit roots, specify and interpret the Dickey–Fuller test, and decide whether a series should be modeled in levels or first differences. The article also covers practical choices around first differencing and highlights common pitfalls such as misusing standard critical values for unit-root and cointegration tests. In addition, it discusses cointegration, the Engle–Granger procedure for testing long-run relationships, and the interpretation of stationary residuals from a cointegrating regression. Finally, it presents the structure and intuition of error-correction models, focusing on how the error-correction term and speed-of-adjustment coefficient capture short-run adjustments toward long-run equilibrium in applied and exam-style questions.

CFA Level 2 Syllabus

For the CFA Level 2 exam, you are required to understand time-series analysis with nonstationary data, with a focus on the following syllabus points:

  • Explaining the concept of a unit root and its impact on time-series analysis
  • Describing and applying the Dickey–Fuller test to detect nonstationarity
  • Demonstrating transformation of series with unit roots using first differencing
  • Explaining the process to test for and establish cointegration
  • Interpreting error-correction models (ECMs) and their use in modeling relationships between cointegrated series
  • Discussing implications for regression analysis when time-series variables are nonstationary and/or cointegrated

Test Your Knowledge

Attempt these questions before reading this article. If you find some difficult or cannot remember the answers, remember to look more closely at that area during your revision.

A quantitative analyst is examining quarterly data on a stock index level (PtP_t), aggregate earnings per share (EPStEPS_t), and the three‑month Treasury-bill rate (RtR_t) from 2000–2024. She estimates AR(1) models and runs Dickey–Fuller tests on each level series, then applies Engle–Granger cointegration tests.

Summary of her results:

  • For PtP_t and EPStEPS_t, the DF test statistics on the level series are less negative than the 5% critical values; on the first differences, DF statistics are more negative than the critical values.
  • For RtR_t, the DF test on the level series yields a statistic more negative than the 5% critical value.
  • A regression of ln(EPSt)\ln(EPS_t) on ln(Pt)\ln(P_t) produces highly significant coefficients and R2=0.96R^2 = 0.96. A DF test applied to the residuals rejects the null of a unit root at 5%.
  • A regression of Δln(Pt)\Delta \ln(P_t) on ΔRt\Delta R_t yields R2=0.05R^2 = 0.05 and no significant coefficients.

Use this information to answer Questions 1–4.

  1. Based on the unit-root tests, which classification of processes is most appropriate?
    1. PtP_tPt​ and EPStEPS_tEPSt​ are I(0); RtR_tRt​ is I(1).
    2. PtP_tPt​ and EPStEPS_tEPSt​ are I(1); RtR_tRt​ is I(0).
    3. All three series are I(1).
    4. All three series are I(0).
  2. The regression of ln⁡(EPSt)\ln(EPS_t)ln(EPSt​) on ln⁡(Pt)\ln(P_t)ln(Pt​) is best interpreted as:
    1. A spurious regression because both variables are nonstationary in levels.
    2. A valid long-run cointegrating relationship between earnings and prices.
    3. A short-run relationship that should be estimated in first differences.
    4. An AR(1) model because the dependent variable is lagged once.
  3. Given the test outcomes, which modeling approach is most appropriate for the relationship between PtP_tPt​ and EPStEPS_tEPSt​?
    1. Regress Δln⁡(Pt)\Delta \ln(P_t)Δln(Pt​) on Δln⁡(EPSt)\Delta \ln(EPS_t)Δln(EPSt​) only.
    2. Regress ln⁡(Pt)\ln(P_t)ln(Pt​) on ln⁡(EPSt)\ln(EPS_t)ln(EPSt​) in levels with usual t‑tests.
    3. Estimate an error-correction model with Δln⁡(Pt)\Delta \ln(P_t)Δln(Pt​) as the dependent variable.
    4. Use a pure random-walk model for ln⁡(Pt)\ln(P_t)ln(Pt​) with no role for EPStEPS_tEPSt​.
  4. Regarding the regression of Δln⁡(Pt)\Delta \ln(P_t)Δln(Pt​) on ΔRt\Delta R_tΔRt​, which statement is most accurate?
    1. The low R2R^2R2 proves the relationship is spurious due to nonstationarity.
    2. The regression is likely valid because both variables are stationary and show no relationship.
    3. The regression should be re-estimated in levels to increase explanatory power.
    4. The presence of cointegration between PtP_tPt​ and EPStEPS_tEPSt​ implies that RtR_tRt​ must also be cointegrated.

Introduction

Analysis of time-series data in finance frequently uncovers patterns such as persistence, trends, or seasonal fluctuations. However, not all time-series data are suitable for standard regression analysis; the property of stationarity is critical for reliable statistical inference. Economic and financial data often exhibit nonstationarity because of trends, structural changes, or random walk behavior. This article examines unit roots (a primary cause of nonstationarity), details statistical methods for testing and correcting for nonstationarity, and introduces cointegration and error-correction models as solutions for valid inference between related nonstationary variables.

Key Term: Unit Root
A characteristic of a time series where the coefficient on the lagged dependent variable equals one (in an AR(1) model), making the series nonstationary and causing its variance to increase without bound over time.

Key Term: Covariance Stationary
A time series whose mean, variance, and autocovariances are constant over time. Such series are stable around a fixed mean and are suitable for standard statistical analysis and AR modeling.

Covariance Stationarity and AR(1) Models

Many time-series models used in the curriculum are autoregressive (AR). The simplest is the AR(1):

xt=b0+b1xt1+εtx_t = b_0 + b_1 x_{t-1} + \varepsilon_t

where εt\varepsilon_t is white noise. For this process to be covariance stationary:

  • The expected value E(xt)E(x_t) must be constant and finite.
  • The variance Var(xt)\operatorname{Var}(x_t) must be constant and finite.
  • The covariance Cov(xt,xtk)\operatorname{Cov}(x_t, x_{t-k}) for any lag kk must depend only on kk, not on tt.

If b1<1\lvert b_1 \rvert < 1, the series is mean-reverting and covariance stationary. The long-run mean (often called the mean-reverting level) is:

μ=b01b1\mu = \frac{b_0}{1 - b_1}

Key Term: Mean-Reverting Level
The long-run equilibrium value to which a stationary AR(1) process tends, calculated as b0/(1b1)b_0 / (1 - b_1) when b1<1\lvert b_1 \rvert < 1.

If b1=1b_1 = 1, the mean-reverting level is undefined (division by zero), and the process becomes a random walk with a unit root, which is not covariance stationary.

Random Walks and Unit Roots

A commonly observed random walk process can be written as:

xt=xt1+εtx_t = x_{t-1} + \varepsilon_t

where εt\varepsilon_t is white noise. Here, the coefficient on xt1x_{t-1} is 1, indicating a unit root.

Key Term: Random Walk
A process where the best forecast of the next value is the current value plus a purely random shock: xt=xt1+εtx_t = x_{t-1} + \varepsilon_t. It has a unit root, is nonstationary, and does not mean-revert.

Sometimes there is a deterministic drift:

xt=b0+xt1+εtx_t = b_0 + x_{t-1} + \varepsilon_t

Key Term: Random Walk with Drift
A random walk process with a nonzero intercept b0b_0, so the expected change each period is b0b_0. The series trends upward or downward on average but remains nonstationary.

In both cases (with or without drift), b1=1b_1 = 1 implies a unit root, and the variance of xtx_t grows with tt. Least squares regression on such series in levels can be misleading unless we transform the data or exploit cointegration.

Key Term: Spurious Regression
Incorrect statistical inference where regression of two unrelated nonstationary time series indicates a significant relationship—high R2R^2 and t‑statistics—even though the variables are unrelated in economic terms. This arises because both series trend or wander over time.

Why Stationarity Matters

A covariance stationary process has a stable mean and variance, so its statistical properties do not change over time. Most estimation techniques—including hypothesis tests—implicitly assume stationarity. If that assumption is violated:

  • Estimated coefficients may appear statistically significant when the variables are not truly related.
  • Standard errors and test statistics follow different distributions than assumed.
  • Forecasts may be systematically biased or unstable.

When a time series has a unit root, its statistical properties change over time, resulting in spurious relationships and unreliable regression results. For example, regressing the level of a stock index on the level of an unrelated macro variable (both following random walks) can yield a high R2R^2 even though there is no economic link.

From an exam standpoint, whenever you see trending or random-walk-like series in levels, you should immediately think about unit-root testing and the risk of spurious regression.

Recognizing and Testing for Unit Roots

Consider the AR(1) model:

xt=b0+b1xt1+εtx_t = b_0 + b_1 x_{t-1} + \varepsilon_t
  • If b1<1\lvert b_1 \rvert < 1, the process is stationary and mean-reverting.
  • If b1=1b_1 = 1, the process is a random walk with a unit root.
  • If b1b_1 is close to 1, the series is highly persistent, and unit-root behavior is plausible.

In practice, we do not know b1b_1, so we estimate it and test for a unit root.

Testing for a Unit Root: Dickey–Fuller Test

The Dickey–Fuller (DF) test is the standard approach for diagnosing a unit root in an AR(1) setting. Start from:

xt=b0+b1xt1+εtx_t = b_0 + b_1 x_{t-1} + \varepsilon_t

Subtract xt1x_{t-1} from both sides:

Δxt=xtxt1=b0+(b11)xt1+εt\Delta x_t = x_t - x_{t-1} = b_0 + (b_1 - 1)x_{t-1} + \varepsilon_t

Define β=b11\beta = b_1 - 1:

Δxt=α+βxt1+εt\Delta x_t = \alpha + \beta x_{t-1} + \varepsilon_t

The null and alternative hypotheses are:

  • H0H_0: β=0\beta = 0 (equivalently b1=1b_1 = 1) → series has a unit root (nonstationary).
  • H1H_1: β<0\beta < 0 (equivalently b1<1b_1 < 1) → series is stationary.

The testing steps are:

  • Estimate the regression: Δxt=α+βxt1+εt\Delta x_t = \alpha + \beta x_{t-1} + \varepsilon_t
  • Compute the t‑statistic on β^\hat{\beta}.
  • Compare the t‑statistic to Dickey–Fuller critical values (not the usual t‑distribution critical values).

If the test statistic is more negative than the DF critical value, reject H0H_0 and conclude the series is covariance stationary. If you cannot reject H0H_0, treat the series as having a unit root (nonstationary).

Key Term: Dickey–Fuller Test
A unit-root test based on regressing the first difference of a variable on its lagged level and testing whether the coefficient on the lagged level equals zero (implying a unit root). It uses special critical values because the null is a unit root, not stationarity.

In practice, analysts often use the Augmented Dickey–Fuller (ADF) test, which adds lagged differences of xtx_t to the right-hand side to absorb serial correlation in εt\varepsilon_t. The intuition and exam logic are the same: test whether the series has a unit root.

Key Term: Order of Integration
The number of differences required to transform a series into a stationary process. A series that becomes stationary after first differencing is integrated of order one, denoted I(1). A stationary series in levels is I(0).

Worked Example 1.1

A macroeconomist analyzes quarterly real GDP in levels using the Dickey–Fuller regression and finds that the DF test statistic is less negative than the 5% critical value. When she applies the DF test to the first differences of GDP, she strongly rejects the null of a unit root.

Answer:
The non-significant DF statistic in levels means the null of a unit root cannot be rejected; GDP in levels behaves as an I(1) series. The significant DF statistic for ΔGDP\Delta GDP implies the differenced series is stationary (I(0)). She should model GDP using first differences (growth rates) in AR or regression models, not the raw level series, unless she is explicitly working with cointegration.

First Differencing to Achieve Stationarity

First differencing transforms a nonstationary series with a unit root into a stationary series by computing period-to-period changes:

Δxt=xtxt1\Delta x_t = x_t - x_{t-1}

If xtx_t follows a random walk:

xt=xt1+εtx_t = x_{t-1} + \varepsilon_t

then:

Δxt=εt\Delta x_t = \varepsilon_t

which is white noise and covariance stationary.

Key Term: First Difference
The change between consecutive observations in a time series, Δxt=xtxt1\Delta x_t = x_t - x_{t-1}, often used to remove unit roots and induce stationarity so that standard AR models and regressions can be applied.

In the exam, once you conclude a series has a unit root:

  • Transform it by taking first differences (for I(1) series).
  • Model the differenced series using AR, MA, or regression techniques.
  • Interpret coefficients as effects on changes, not on levels. For example, a regression of Δyt\Delta y_t on Δxt\Delta x_t measures the short-run relationship between changes in xx and changes in yy.

Be careful not to overdifference. Differencing an already stationary series can remove meaningful long-run information and make interpretation harder. That is why unit-root testing is an essential first step.

Cointegration: Long-Run Relationships Between Nonstationary Series

Sometimes, two or more economic variables are individually nonstationary but move together in such a way that they do not drift arbitrarily far apart. For example, stock prices and dividends, consumption and income, or spot and futures prices may share a stable long-run equilibrium relationship.

Key Term: Cointegration
A property where two or more I(1) time series are linked by a linear combination that is I(0) (stationary). This indicates a stable long-run equilibrium relationship among the series.

Formally, suppose yty_t and xtx_t are both I(1), but there exists a coefficient vector (1,β)(1, -\beta) such that:

et=ytαβxte_t = y_t - \alpha - \beta x_t

is I(0). Then yty_t and xtx_t are cointegrated, β\beta is part of the cointegrating vector, and ete_t captures deviations from long-run equilibrium.

Key Term: Cointegrating Vector
The set of coefficients that defines the stationary linear combination of cointegrated series. In a simple two-variable case, ytαβxty_t - \alpha - \beta x_t is stationary, and (1,β)(1, -\beta) is the cointegrating vector.

If cointegration is present:

  • Regressions between the level variables can be meaningful and represent long-run relationships.
  • You should not simply difference both variables and ignore the long-run equilibrium; instead, use an error-correction model to capture both short-run dynamics and adjustment back to equilibrium.

If the variables are nonstationary but not cointegrated, any regression in levels is spurious, and you should model relationships in first differences.

Testing for Cointegration: Engle–Granger Procedure

For the Level 2 exam, the key cointegration test is the Engle–Granger two-step procedure.

Key Term: Engle–Granger Test
A two-step method for testing cointegration: estimate a long-run regression in levels among nonstationary variables, then apply a unit-root test (with special critical values) to the residuals. Stationary residuals imply cointegration.

The steps are:

  • Test each series (e.g., yty_t and xtx_t) for unit roots using the DF/ADF test. Proceed only if both are I(1).
  • Estimate the long-run (cointegrating) regression in levels: yt=α+βxt+ety_t = \alpha + \beta x_t + e_t
  • Save the residuals e^t\hat{e}_t and test them for a unit root using a DF-type test. Importantly, the critical values are different from standard DF critical values and are typically provided in exam tables.

Interpretation:

  • If the residuals are stationary (reject unit root), yty_t and xtx_t are cointegrated; the regression in levels is not spurious and can be used to define the long-run relationship.
  • If the residuals have a unit root (cannot reject), the variables are not cointegrated; regressions in levels are spurious, and you should work with differences.

Worked Example 1.2

You regress log consumption on log income using quarterly macro data:

lnCt=α+βlnYt+et\ln C_t = \alpha + \beta \ln Y_t + e_t

DF tests suggest both lnCt\ln C_t and lnYt\ln Y_t are I(1). The DF test on the residuals ete_t rejects the null of a unit root at the 5% level using Engle–Granger critical values.

Answer:
The residuals’ stationarity indicates that lnCt\ln C_t and lnYt\ln Y_t are cointegrated. Their regression in levels reflects a valid long-term economic relationship between consumption and income. You should interpret β\beta as the long-run elasticity of consumption with respect to income, and model short-term dynamics using an error-correction model that combines ΔlnCt\Delta \ln C_t, ΔlnYt\Delta \ln Y_t, and the lagged residual et1e_{t-1}.

Error-Correction Models (ECMs): Linking Short- and Long-Term Behavior

Cointegrated series may deviate from equilibrium in the short term because of shocks, frictions, or adjustment delays. Error-correction models (ECMs) explicitly model:

  • Short-run changes in the variables (typically first differences), and
  • The gradual correction of deviations from long-run equilibrium.

Key Term: Error-Correction Model (ECM)
A regression model for cointegrated series that includes both short-run changes (e.g., first differences) and an error-correction term (lagged deviation from the cointegrating relationship), allowing the system to adjust gradually back to long-run equilibrium.

Starting from the cointegrating relationship:

yt=α+βxt+ety_t = \alpha + \beta x_t + e_t

the error-correction form is:

Δyt=γ0+γ1Δxt+λet1+εt\Delta y_t = \gamma_0 + \gamma_1 \Delta x_t + \lambda e_{t-1} + \varepsilon_t

where:

  • Δyt\Delta y_t and Δxt\Delta x_t capture short-run changes.
  • et1e_{t-1} is the lagged residual from the long-run cointegrating equation.
  • λ\lambda is the error-correction (speed-of-adjustment) coefficient.

Key Term: Error-Correction Term
The lagged residual from the cointegrating regression (e.g., et1e_{t-1}). It measures last period’s deviation from long-run equilibrium and drives the adjustment in the dependent variable back toward equilibrium.

Key Term: Speed of Adjustment
The coefficient on the error-correction term (e.g., λ\lambda). It indicates the fraction of last period’s disequilibrium that is corrected in the current period; typically negative and between −1 and 0 in a stable ECM.

Interpretation of λ\lambda:

  • If λ=0\lambda = 0, there is no adjustment toward the long-run relationship; only short-run dynamics matter.
  • If 1<λ<0-1 < \lambda < 0, deviations from equilibrium are partially corrected each period. For example, λ=0.3\lambda = -0.3 means about 30% of the gap is closed per period.
  • If λ1\lambda \leq -1 or λ>0\lambda > 0, the system may be unstable or overshooting, which would be unusual in well-behaved economic applications.

Worked Example 1.3

An analyst models the relationship between money supply (MtM_t) and inflation (πt\pi_t) using an error-correction model (after establishing cointegration between the level series). The ECM for inflation is:

Δπt=0.2+0.4ΔMt0.25et1+εt\Delta \pi_t = 0.2 + 0.4 \Delta M_t - 0.25 e_{t-1} + \varepsilon_t

where et1e_{t-1} is last period’s residual from the long-run cointegrating regression of πt\pi_t on MtM_t. The coefficient on et1e_{t-1} is statistically significant and negative.

Answer:
The significant negative coefficient on et1e_{t-1} means that when inflation is above the long-run level implied by the money supply (positive et1e_{t-1}), the change in inflation Δπt\Delta \pi_t tends to be lower in the next period, pulling inflation back down toward equilibrium. Specifically, with λ=0.25\lambda = -0.25, about 25% of any deviation from the long-run relationship is corrected in each period. The 0.4 coefficient on ΔMt\Delta M_t captures the short-run impact of changes in money supply on changes in inflation.

Exam Warning on ECM Interpretation

In exam questions, pay close attention to:

  • The sign of the error-correction term’s coefficient (should be negative for a stable adjustment of yty_t back toward equilibrium when et1e_{t-1} is positive).
  • Whether the ECM is written for yty_t or xtx_t; the adjustment speed can differ depending on which variable is the dependent one.
  • The distinction between short-run coefficients on differenced variables and the long-run cointegration coefficient β\beta.

Exam Warning: Unit-Root and Cointegration Tests

Do not use standard t‑ or normal critical values when applying Dickey–Fuller or Engle–Granger tests. Use the special critical values provided for unit-root and cointegration tests. Failing to do so can lead to incorrect conclusions about stationarity and cointegration.

Further, avoid regressing nonstationary series in levels without first:

  • Testing each series for unit roots, and
  • Either transforming them (e.g., first differencing) or verifying cointegration.

Otherwise, the regression may be spurious even if the R2R^2 is high and t‑statistics appear significant.

Nonstationarity, Cointegration, and Regression Analysis

When using time series in regressions, think in terms of the order of integration and cointegration status:

  • Case 1: All variables are I(0) (stationary in levels).

    • Regressions in levels are valid.
    • Standard inference (t‑ and F‑tests) is appropriate.
    • No differencing or cointegration considerations are needed.
  • Case 2: Variables are I(1) but not cointegrated.

    • Regressions in levels are spurious.
    • Regressions should use first differences (or other transformations that yield stationarity), e.g., Δyt\Delta y_t on Δxt\Delta x_t.
    • No error-correction term is used, because there is no long-run equilibrium linking the levels.
  • Case 3: Variables are I(1) and cointegrated.

    • The regression in levels represents a valid long-run equilibrium relationship.
    • Short-run dynamics should be modeled using an ECM, combining differenced variables and the error-correction term.
    • Ignoring cointegration and working only with differences discards information about the long-run relationship.

In exam scenarios, once you identify that two nonstationary series are cointegrated, the preferred modeling approach is an ECM rather than a pure differenced regression.

Key Point Checklist

This article has covered the following key knowledge points:

  • The distinction between covariance-stationary (I(0)) and unit-root (I(1)) time-series processes
  • Why random walks (with or without drift) are nonstationary and produce spurious regression results
  • The structure of an AR(1) model and the role of the lag coefficient in determining mean reversion
  • The concept of a unit root and its implications for time-series modeling and forecasting
  • The Dickey–Fuller unit-root test: regression specification, hypotheses, and use of special critical values
  • The rationale for and application of first differencing to transform I(1) series into stationary series
  • The concept of order of integration and the notation I(0), I(1)
  • The definition of cointegration and its interpretation as a long-run equilibrium relationship between nonstationary series
  • The Engle–Granger two-step cointegration test using residuals from a long-run regression
  • The structure of an error-correction model, including short-run differenced terms and the error-correction term
  • Interpretation of the speed-of-adjustment coefficient in ECMs and its implications for stability
  • The decision rules for when regressions in levels are valid, when they are spurious, and when ECMs are appropriate
  • Common exam pitfalls: ignoring unit roots, misusing standard critical values, and misinterpreting ECM coefficients

Key Terms and Concepts

  • Unit Root
  • Covariance Stationary
  • Mean-Reverting Level
  • Random Walk
  • Random Walk with Drift
  • Spurious Regression
  • Dickey–Fuller Test
  • Order of Integration
  • First Difference
  • Cointegration
  • Cointegrating Vector
  • Engle–Granger Test
  • Error-Correction Model (ECM)
  • Error-Correction Term
  • Speed of Adjustment

Assistant

How can I help you?
Expliquer en français
Explicar en español
Объяснить на русском
شرح بالعربية
用中文解释
हिंदी में समझाएं
Give me a quick summary
Break this down step by step
What are the key points?
Study companion mode
Homework helper mode
Loyal friend mode
Academic mentor mode
Expliquer en français
Explicar en español
Объяснить на русском
شرح بالعربية
用中文解释
हिंदी में समझाएं
Give me a quick summary
Break this down step by step
What are the key points?
Study companion mode
Homework helper mode
Loyal friend mode
Academic mentor mode

Responses can be incorrect. Please double check.