ARIMA Order Selection

Automatic ARIMA(p,d,q) order selection chooses the differencing order dd and the AR/MA orders p,qp, q by separate criteria. AIC or BIC selects p,qp, q, while the KPSS stationarity test selects dd. There is a reason dd is not chosen by AIC/BIC. This page explains that reason and the procedure MIDAS uses. For how to run it, see the ARIMA section of the Agent API.

Why AIC/BIC cannot select the differencing order

The log-likelihood of an ARIMA(p,d,q) model is computed on the dd-times differenced series Δdy\Delta^d y. The response variable changes with dd — it is yy, Δy\Delta y, or Δ2y\Delta^2 y — and the number of observations entering the likelihood changes with it, from NN to N1N-1 to N2N-2.

AIC is

AIC=2(θ^)+2k\text{AIC} = -2\,\ell(\hat\theta) + 2k

an estimator of the Kullback-Leibler divergence between the fitted model and the data-generating process for a fixed response and a fixed sample. When dd changes, both the response and the sample size change, so AIC differences across dd are not comparable. Differencing is the lag-polynomial transform (1B)d(1-B)^d, whose Jacobian determinant is 1, so no correction term recovers the comparison; evaluating conditional likelihoods on different numbers of points is itself what breaks it. BIC fails to compare across dd for the same reason. The differencing order must be set by a criterion other than the in-sample likelihood.

The exact-likelihood alternative

The only way to compare an information criterion across dd on equal footing is to compute the exact likelihood of the original, undifferenced series yy — same response, same sample size — for every candidate. Treating ARIMA(p,d,q) as an ARMA(p+dp+d, qq) with dd roots fixed on the unit circle puts every candidate on the same yy.

A model with a unit root is non-stationary, though: its marginal variance diverges, so the unconditional likelihood is undefined. Defining the likelihood requires a diffuse-prior Kalman filter, and even then the asymptotics of AIC for integrated processes hold weakly. This is why almost every ARIMA implementation differences first and never compares AIC across dd; implementing the exact likelihood would not change the conclusion.

d is a question about the number of unit roots

How many times to difference is the same question as how many stochastic trends, or unit roots, the series carries. That differs from the question AIC answers, which is which ARMA structure fits a given stationary series. Deciding dd with a diagnostic for unit roots and choosing p,qp, q with AIC under a fixed dd assigns each criterion to the question it can actually answer.

Selecting d with the KPSS stationarity test

MIDAS selects dd with the KPSS test. Kwiatkowski, Phillips, Schmidt, and Shin proposed it in 1992 with stationarity as the null hypothesis and a unit root as the alternative — the reverse of the ADF test, whose null is a unit root. The direction matters. ADF has low power near the unit root and tends to difference stationary series that sit close to one, whereas KPSS requires positive evidence against stationarity and is therefore conservative about differencing.

The statistic is computed as follows. From the mean-centered residuals et=ytyˉe_t = y_t - \bar y, form

η=1n2t=1nSt2,St=i=1tei\eta = \frac{1}{n^2}\sum_{t=1}^{n} S_t^2, \qquad S_t = \sum_{i=1}^{t} e_i

and estimate the long-run variance with a Bartlett kernel:

s^2=1nt=1net2+2ni=1(1i+1)t=i+1neteti,=4(n/100)1/4\hat s^2 = \frac{1}{n}\sum_{t=1}^{n} e_t^2 + \frac{2}{n}\sum_{i=1}^{\ell}\left(1 - \frac{i}{\ell+1}\right)\sum_{t=i+1}^{n} e_t\,e_{t-i}, \qquad \ell = \left\lfloor 4\,(n/100)^{1/4} \right\rfloor

The statistic is η^/s^2\hat\eta / \hat s^2. The differencing order starts at d=0d=0; the series is differenced while the statistic exceeds a decision threshold, and the procedure stops once it no longer does or once dd reaches the limit maxD\text{maxD}. MIDAS uses the 5% critical value 0.4630.463 of the statistic's asymptotic null distribution as that threshold. It is not an absolute standard but a fixed constant for deciding whether to difference. A series whose variance is essentially zero after differencing is treated as stationary, and the procedure stops.

Why the sequential procedure works, and the role of the threshold

Suppose the true process carries d0d_0 unit roots. For d<d0d < d_0, the differenced series still contains a stochastic trend and the KPSS statistic diverges with the sample size, so differencing almost surely continues. At d=d0d = d_0 the series is stationary and KPSS has the correct asymptotic size α\alpha. A series differenced past d0d_0 is stationary, the null holds, and the procedure stops on its own.

Asymptotically, then, the procedure stops at the true d0d_0 with probability 1α1-\alpha and over-differences by one with probability about α\alpha. The α\alpha attached to the threshold directly controls the asymptotic probability of over-differencing.

There is no theoretically optimal α\alpha. Under-differencing leaves a stochastic trend in the model, making the fitted model non-stationary and miscalibrating its forecast intervals. Over-differencing introduces a non-invertible MA component with a unit root, along with an extra parameter, but its cost is considered smaller. This asymmetry of losses is the basis for the conventional 5% (α=0.05\alpha = 0.05). Across the practical range 0.010.01 to 0.100.10, the asymptotic over-differencing rate stays between 1% and 10%, unlike the systematic over-differencing that results from choosing dd by AIC.

Three caveats. First, the KPSS test MIDAS uses takes level (constant-only) stationarity as its null. For a series with only a deterministic linear trend and no unit root — a trend-stationary series — the trend remains in the mean-centered residuals and the statistic diverges with the sample size, so the series is differenced even though its true differencing order is 0. The "over-differences with probability about α\alpha" above concerns processes with a stochastic trend and does not apply to a deterministic trend; for a series suspected of being trend-stationary, model the trend or detrend it beforehand rather than differencing. Second, KPSS has finite-sample size distortion and can over-reject under strong short-run autocorrelation, depending on the long-run variance estimate; the maxD\text{maxD} limit bounds that. Third, the selected dd is a data-dependent random variable, so the confidence intervals for the coefficients are conditional on the chosen order — not specific to KPSS, but common to any automatic order selection.

How MIDAS applies this

In automatic selection, MIDAS fixes dd with KPSS and then chooses p,qp, q under that dd by AIC or BIC. When the order is specified manually, the given dd is used directly.

MIDAS reports only the selected dd and the operational fact of whether the statistic stayed below or rose above the decision threshold. It does not display p-values or significant/not-significant verdicts.

See also

References