Principal Component Analysis for Nonstationary Series
James D Hamilton, Jin Xi Link
Abstract
This paper develops a procedure for uncovering the common cyclical factors that drive a mix of stationary and nonstationary variables. The method does not require knowing which variables are nonstationary or the nature of the nonstationarity. Applications to the term structure of interest rates and to the FRED-MD macroeconomic dataset demonstrate that the approach offers similar benefits to those of traditional principal component analysis with some added advantages.
Method for conducting Principal component analysis for non-stationary data.
Avoid standard, static PCA when working with non-stationary data, as there’s a tendency for the factors that emerge to be spurious: they won’t represent the true factor structure if there is one, and they’ll tend to invent factors where none exist like how OLS generates correlations in sample where none exist in population. This happens for the same reasons as does spurious regression with time series data:
- Trending data isn’t stationary.
- Data that isn’t stationary doesn’t have a defined mean or variance.
- This violates the assumption of standard estimators like OLS and destroys any guarantee of correctness for the results.
- Like OLS, PCA is a least squares estimator and thus inherits similar strengths and weaknesses.
For a nonstationary variable, the population mean is undefined and the sample standard deviation diverges to infinity as the number of time-series observations gets large.
Focusing on the cyclical, non-trending portion of a series mitigates these issues. This requires a filter that can extract this component. There’s no shortage of filters, but an easy one proposed by the same author of this method is the Hamilton filter (Why You Should Never Use the Hodrick-Prescott Filter):
- The Hamilton filter extracts a stationary cyclical component from a diverse array of time series DGPs
- It’s simple to use, only requiring running OLS on each series controlling for lagged values of the variable (with at least some delay) and extracting the residuals from this regression, which represent the cyclical component
- This now stationary data form the input to PCA, which will yield factors which are themselves stationary and non-spurious
Some interesting notes:
- Use log transformation () for most variables, e.g. ones you would normally first difference. If a variable has negative values, use instead
- Use lags that add up to a year’s worth of observations to account for seasonality