ARIMA Order Selection

Automatic ARIMA(p,d,q) order selection chooses the differencing order $d$ and the AR/MA orders $p, q$ by separate criteria. AIC or BIC selects $p, q$ , while the KPSS stationarity test selects $d$ . There is a reason $d$ is not chosen by AIC/BIC. This page explains that reason and the procedure MIDAS uses. For how to run it, see the ARIMA section of the Agent API.

Why AIC/BIC cannot select the differencing order

The log-likelihood of an ARIMA(p,d,q) model is computed on the $d$ -times differenced series $\Delta^d y$ . The response variable changes with $d$ — it is $y$ , $\Delta y$ , or $\Delta^2 y$ — and the number of observations entering the likelihood changes with it, from $N$ to $N-1$ to $N-2$ .

AIC is

\text{AIC} = -2\,\ell(\hat\theta) + 2k

an estimator of the Kullback-Leibler divergence between the fitted model and the data-generating process for a fixed response and a fixed sample. When $d$ changes, both the response and the sample size change, so AIC differences across $d$ are not comparable. Differencing is the lag-polynomial transform $(1-B)^d$ , whose Jacobian determinant is 1, so no correction term recovers the comparison; evaluating conditional likelihoods on different numbers of points is itself what breaks it. BIC fails to compare across $d$ for the same reason. The differencing order must be set by a criterion other than the in-sample likelihood.

The exact-likelihood alternative

The only way to compare an information criterion across $d$ on equal footing is to compute the exact likelihood of the original, undifferenced series $y$ — same response, same sample size — for every candidate. Treating ARIMA(p,d,q) as an ARMA( $p+d$ , $q$ ) with $d$ roots fixed on the unit circle puts every candidate on the same $y$ .

A model with a unit root is non-stationary, though: its marginal variance diverges, so the unconditional likelihood is undefined. Defining the likelihood requires a diffuse-prior Kalman filter, and even then the asymptotics of AIC for integrated processes hold weakly. This is why almost every ARIMA implementation differences first and never compares AIC across $d$ ; implementing the exact likelihood would not change the conclusion.

d is a question about the number of unit roots

How many times to difference is the same question as how many stochastic trends, or unit roots, the series carries. That differs from the question AIC answers, which is which ARMA structure fits a given stationary series. Deciding $d$ with a diagnostic for unit roots and choosing $p, q$ with AIC under a fixed $d$ assigns each criterion to the question it can actually answer.

Selecting d with the KPSS stationarity test

MIDAS selects $d$ with the KPSS test. Kwiatkowski, Phillips, Schmidt, and Shin proposed it in 1992 with stationarity as the null hypothesis and a unit root as the alternative — the reverse of the ADF test, whose null is a unit root. The direction matters. ADF has low power near the unit root and tends to difference stationary series that sit close to one, whereas KPSS requires positive evidence against stationarity and is therefore conservative about differencing.

The statistic is computed as follows. From the mean-centered residuals $e_t = y_t - \bar y$ , form

\eta = \frac{1}{n^2}\sum_{t=1}^{n} S_t^2, \qquad S_t = \sum_{i=1}^{t} e_i

and estimate the long-run variance with a Bartlett kernel:

\hat s^2 = \frac{1}{n}\sum_{t=1}^{n} e_t^2 + \frac{2}{n}\sum_{i=1}^{\ell}\left(1 - \frac{i}{\ell+1}\right)\sum_{t=i+1}^{n} e_t\,e_{t-i}, \qquad \ell = \left\lfloor 4\,(n/100)^{1/4} \right\rfloor

The statistic is $\hat\eta / \hat s^2$ . The differencing order starts at $d=0$ ; the series is differenced while the statistic exceeds a decision threshold, and the procedure stops once it no longer does or once $d$ reaches the limit $\text{maxD}$ . MIDAS uses the 5% critical value $0.463$ of the statistic's asymptotic null distribution as that threshold. It is not an absolute standard but a fixed constant for deciding whether to difference. A series whose variance is essentially zero after differencing is treated as stationary, and the procedure stops.

Why the sequential procedure works, and the role of the threshold

Suppose the true process carries $d_0$ unit roots. For $d < d_0$ , the differenced series still contains a stochastic trend and the KPSS statistic diverges with the sample size, so differencing almost surely continues. At $d = d_0$ the series is stationary and KPSS has the correct asymptotic size $\alpha$ . A series differenced past $d_0$ is stationary, the null holds, and the procedure stops on its own.

Asymptotically, then, the procedure stops at the true $d_0$ with probability $1-\alpha$ and over-differences by one with probability about $\alpha$ . The $\alpha$ attached to the threshold directly controls the asymptotic probability of over-differencing.

There is no theoretically optimal $\alpha$ . Under-differencing leaves a stochastic trend in the model, making the fitted model non-stationary and miscalibrating its forecast intervals. Over-differencing introduces a non-invertible MA component with a unit root, along with an extra parameter, but its cost is considered smaller. This asymmetry of losses is the basis for the conventional 5% ( $\alpha = 0.05$ ). Across the practical range $0.01$ to $0.10$ , the asymptotic over-differencing rate stays between 1% and 10%, unlike the systematic over-differencing that results from choosing $d$ by AIC.

Three caveats. First, the KPSS test MIDAS uses takes level (constant-only) stationarity as its null. For a series with only a deterministic linear trend and no unit root — a trend-stationary series — the trend remains in the mean-centered residuals and the statistic diverges with the sample size, so the series is differenced even though its true differencing order is 0. The "over-differences with probability about $\alpha$ " above concerns processes with a stochastic trend and does not apply to a deterministic trend; for a series suspected of being trend-stationary, model the trend or detrend it beforehand rather than differencing. Second, KPSS has finite-sample size distortion and can over-reject under strong short-run autocorrelation, depending on the long-run variance estimate; the $\text{maxD}$ limit bounds that. Third, the selected $d$ is a data-dependent random variable, so the confidence intervals for the coefficients are conditional on the chosen order — not specific to KPSS, but common to any automatic order selection.

How MIDAS applies this

In automatic selection, MIDAS fixes $d$ with KPSS and then chooses $p, q$ under that $d$ by AIC or BIC. When the order is specified manually, the given $d$ is used directly.

MIDAS reports only the selected $d$ and the operational fact of whether the statistic stayed below or rose above the decision threshold. It does not display p-values or significant/not-significant verdicts.

References

Hyndman, R. J., & Khandakar, Y. (2008). Automatic time series forecasting: The forecast package for R. Journal of Statistical Software, 27(3), 1-22. https://www.jstatsoft.org/article/view/v027i03

Also available as a Markdown file.