Numerical Computing Fundamentals

Precision can be lost during a computation when predictors are highly correlated, when fitting a high-degree polynomial, or when the data has a large mean and small variance. This page explains how that happens and what MIDAS does about it. It concerns precision lost during the computation, not the precision of the measured values themselves. It also covers the concepts behind the Numerical Accuracy page.

Floating-Point Numbers and Significant Digits

Computers represent real numbers as finite-precision approximations. All browser applications, including MIDAS, use IEEE 754 double-precision floating-point numbers.

A double-precision number represents a real number as (1)s×(1+f)×2e(-1)^s \times (1 + f) \times 2^e. ss is a 1-bit sign, ee is an 11-bit exponent encoding the scale, and ff is the 52-bit significand representing the fractional part where 0f<10 \le f < 1. For normal numbers1, the integer part of (1+f)(1 + f) is always 1, so this 1 is not stored but implied. The 52 stored bits plus the implicit 1 bit give 53 significant bits, corresponding to about 15.9 decimal significant digits:

log10(253)15.95\log_{10}(2^{53}) \approx 15.95

A single double-precision number can therefore represent at most about 15.9 significant digits. On the Numerical Accuracy page, LRE (Log Relative Error: a measure of how many significant digits MIDAS's computed value shares with the known correct value) close to this value means the computation is accurate to the limit of double precision. That page covers how to judge LRE values.

When converting a real number to a floating-point number, any fraction that cannot be represented in a finite number of bits is rounded. This rounding error is tiny for a single operation, but can accumulate over successive computations. The extent of accumulation depends on the algorithm and data; catastrophic cancellation and condition numbers, discussed below, are the primary factors.

Catastrophic Cancellation

Catastrophic cancellation occurs when subtracting two nearly equal floating-point numbers, causing a large loss of significant digits.

Consider a=1.23456789012345a = 1.234567890123\mathbf{45} and b=1.23456789012300b = 1.234567890123\mathbf{00}. Both have 15 significant digits, but their difference ab=0.00000000000045a - b = 0.00000000000045 has only 2. The leading zeros merely indicate position and are not significant, so only the trailing 45 carries meaningful information. Once these decimals are stored in double precision their trailing digits already carry rounding error, so the 13 digits shared between aa and bb cancel out, leaving only those rounding-error-laden low digits.

In statistics, variance computation is a classic example. The definitional form (xixˉ)2/(n1)\sum(x_i - \bar{x})^2 / (n-1) can suffer cancellation in each deviation xixˉx_i - \bar{x} when the mean is large relative to the standard deviation. However, the squared deviations are all non-negative, so their sum involves no subtraction. The catastrophic single-subtraction loss that occurs in the algebraic form does not arise. That said, significant digits lost in each individual deviation xixˉx_i - \bar{x} are not recovered by squaring, so precision does degrade—though not as severely as with the algebraic form. The algebraically equivalent form (xi2nxˉ2)/(n1)(\sum x_i^2 - n\bar{x}^2) / (n-1) concentrates the cancellation into a single subtraction between xi2\sum x_i^2 and nxˉ2n\bar{x}^2, which can cause a catastrophic loss of precision. MIDAS uses Welford's online algorithm2 and additionally shifts by the first value before the incremental update. Updating the mean and sum of squared deviations for each data point avoids the subtraction of large intermediate values seen in the algebraic form, and the shift curbs the precision loss that a large offset would otherwise cause in the running mean, achieving precision comparable to the two-pass definitional form in a single pass. Even so, when the mean is extremely large and the variance extremely small, some cancellation remains and the standard deviation can lose precision; the Numerical Accuracy page shows this level with NumAcc3 and NumAcc4.

Condition Number

The condition number measures how much small perturbations in the input are amplified in the output. For a nonsingular matrix AA, the condition number is defined as:

κ(A)=AA1\kappa(A) = \|A\| \cdot \|A^{-1}\|

A\|A\| is the matrix 2-norm (largest singular value). Since AA1AA1=I=1\|A\| \cdot \|A^{-1}\| \ge \|AA^{-1}\| = \|I\| = 1, the condition number has a lower bound of 1. A matrix with a condition number close to 1 is called well-conditioned; one with a large condition number is called ill-conditioned. There is no sharp threshold; the distinction depends on the precision required.

When solving Ax=bAx = b in floating-point arithmetic, a relative perturbation of δ\delta in bb can produce a relative error up to κ(A)δ\kappa(A) \cdot \delta in xx. A condition number of κ\kappa causes, in the worst case, a loss on the order of log10(κ)\log_{10}(\kappa) significant digits. Subtracting from double precision's roughly 15.9 digits gives an approximate estimate:

significant digits15.9log10(κ)\text{significant digits} \approx 15.9 - \log_{10}(\kappa)

This is a worst-case order-of-magnitude estimate, not a strict guarantee. Actual precision depends on the data and algorithm details, and can substantially exceed the estimate. When computing β^\hat\beta in linear regression, the relevant condition number depends on the solver: QR decomposition works directly with XX, so κ(X)\kappa(X) applies; the normal equations solve (XX)1XY(X'X)^{-1}X'Y, so κ(XX)=κ(X)2\kappa(X'X) = \kappa(X)^2 applies. MIDAS uses QR decomposition for coefficient estimation, so the dominant factor in precision is κ(X)\kappa(X).

When fitting a model via QR decomposition, MIDAS computes an estimate of the condition number and shows a warning with the results when the estimate exceeds 101010^{10}. This estimate is the Frobenius condition number RFR1F\|R\|_F \cdot \|R^{-1}\|_F of the upper triangular factor RR, which satisfies κ(X)RFR1Fpκ(X)\kappa(X) \le \|R\|_F \cdot \|R^{-1}\|_F \le p\,\kappa(X), where pp is the number of columns of the design matrix including the intercept. Because it never underestimates the condition number, an ill-conditioned design matrix is not missed. It overestimates by at most a factor of pp, a difference of log10p\log_{10} p digits in the significant-digit estimate.

Even when the warning appears, the coefficients are still computed and shown; interpret the estimates in light of the lost significant digits. The main cause of ill-conditioning is high correlation among predictors (see Design Matrix); for polynomial regression, switching to an orthogonal polynomial basis helps (see Polynomial Regression).

Design Matrix

In the linear regression model Y=Xβ+εY = X\beta + \varepsilon, XX is the design matrix. YY is the response vector, β\beta is the coefficient vector, and ε\varepsilon is the error term. The design matrix has nn rows and pp columns, with each row corresponding to one observation. When an intercept is included, one of the pp columns is a constant column of ones, and the remaining columns correspond to the predictors (pp is the total column count including the intercept).

The OLS (ordinary least squares) estimator minimizes the residual sum of squares; when XX has full column rank, it can be written as β^=(XX)1XY\hat\beta = (X'X)^{-1}X'Y (see derivation), where XX' denotes the transpose of XX. A large condition number in the design matrix causes rounding errors to be amplified, reducing the accuracy of computing β^\hat\beta. When predictors are highly correlated, the design matrix tends to be ill-conditioned. Before fitting, you can check the correlations among predictors with the correlation matrix in the Statistics tab; after fitting, the VIF in the Linear Regression coefficient table measures multicollinearity.

See OLS Fundamentals for the full mathematical treatment of the design matrix and OLS estimator properties.

Polynomial Regression

In polynomial regression, the columns of the design matrix are 1,x,x2,,xd1, x, x^2, \ldots, x^d. As the degree dd increases, the power columns become strongly correlated and the design matrix becomes ill-conditioned.

The NIST StRD datasets on the Numerical Accuracy page show this effect. The simple regression dataset Norris achieves a coefficient LRE of 12.3, but the 10th degree polynomial Filip drops to 7.3.

The significant-digit estimate from the condition number is a worst-case order of magnitude, and the actual precision often exceeds it. Filip's design matrix has κ(X)2×1015\kappa(X) \approx 2 \times 10^{15}, and the κ(XX)\kappa(X'X) in the normal equations is its square, about 103010^{30}. Plugging κ(X)\kappa(X) into the estimation formula predicts less than one significant digit, yet the observed LRE of 7.3 far exceeds it.

The estimate is pessimistic because the rounding errors of QR decomposition enter relative to the scale of each column of the design matrix. Scaling each column to unit 2\ell_2 norm before computing the condition number gives about 5×1095 \times 10^9 for Filip, so the estimate becomes about 6 digits, close to the observed value.

To reduce the condition number, replace the monomial basis 1,x,x2,1, x, x^2, \ldots with an orthogonal polynomial basis. When the design matrix consists only of the intercept and orthogonal polynomial columns, the columns are mutually orthogonal with equal norms, so the condition number is theoretically 1 (in practice it deviates slightly from 1 due to floating-point rounding). The Orthogonal Polynomials tab in MIDAS applies this transformation. Note that with an orthogonal polynomial basis the value and interpretation of each coefficient differ from the monomial basis (the fitted values, R2R^2, RMSE, and prediction intervals are the same).

See also

Footnotes

  1. Normalization adjusts the exponent so that the integer part is 1, making the representation unique. For example, 0.50.5 is represented as 1.0×211.0 \times 2^{-1}. However, the exponent has a minimum value. For numbers close enough to zero that this minimum is reached, the integer part can no longer be kept at 1, producing subnormal numbers where the implicit leading 1 does not apply. Subnormal numbers have reduced precision, but values in typical statistical input data are not normally in this range.

  2. Welford, B. P. (1962). Note on a method for calculating corrected sums of squares and products. Technometrics, 4(3), 419–420.