Numerical Computing Fundamentals

This page covers the concepts behind the Numerical Accuracy page.

Floating-Point Numbers and Significant Digits

Computers represent real numbers as finite-precision approximations. All browser applications, including MIDAS, use IEEE 754 double-precision floating-point numbers.

A double-precision number represents a real number as $(-1)^s \times (1 + f) \times 2^e$ . $s$ is a 1-bit sign, $e$ is an 11-bit exponent encoding the scale, and $f$ is the 52-bit significand representing the fractional part where $0 \le f < 1$ . For normal numbers¹, the integer part of $(1 + f)$ is always 1, so this 1 is not stored but implied. The 52 stored bits plus the implicit 1 bit give 53 significant bits, corresponding to about 15.9 decimal significant digits:

\log_{10}(2^{53}) \approx 15.95

A single double-precision number can therefore represent at most about 15.9 significant digits. On the Numerical Accuracy page, LRE (Log Relative Error: a measure of how many significant digits match) close to this value means the computation is accurate to the limit of double precision.

When converting a real number to a floating-point number, any fraction that cannot be represented in a finite number of bits is rounded. This rounding error is tiny for a single operation, but can accumulate over successive computations. The extent of accumulation depends on the algorithm and data; catastrophic cancellation and condition numbers, discussed below, are the primary factors.

Catastrophic Cancellation

Catastrophic cancellation occurs when subtracting two nearly equal floating-point numbers, causing a large loss of significant digits.

Consider $a = 1.234567890123\mathbf{45}$ and $b = 1.234567890123\mathbf{00}$ . Both have 15 significant digits, but their difference $a - b = 0.00000000000045$ has only 2. The leading zeros merely indicate position and are not significant, so only the trailing 45 carries meaningful information. The 13 digits that were shared between $a$ and $b$ cancel out, leaving only the least reliable bits.

In statistics, variance computation is a classic example. The definitional form $\sum(x_i - \bar{x})^2 / (n-1)$ can suffer cancellation in each deviation $x_i - \bar{x}$ when the mean is large relative to the standard deviation. However, the squared deviations are all non-negative, so their sum involves no subtraction. The catastrophic single-subtraction loss that occurs in the algebraic form does not arise. That said, significant digits lost in each individual deviation $x_i - \bar{x}$ are not recovered by squaring, so precision does degrade—though not as severely as with the algebraic form. The algebraically equivalent form $(\sum x_i^2 - n\bar{x}^2) / (n-1)$ concentrates the cancellation into a single subtraction between $\sum x_i^2$ and $n\bar{x}^2$ , which can cause a catastrophic loss of precision. MIDAS uses Welford's online algorithm², which incrementally updates the mean and sum of squared deviations for each data point, avoiding the subtraction of large intermediate values that occurs in the algebraic form and achieving precision comparable to the two-pass definitional form in a single pass.

Condition Number

The condition number measures how much small perturbations in the input are amplified in the output. For a nonsingular matrix $A$ , the condition number is defined as:

\kappa(A) = \|A\| \cdot \|A^{-1}\|

$\|A\|$ is the matrix 2-norm (largest singular value). Since $\|A\| \cdot \|A^{-1}\| \ge \|AA^{-1}\| = \|I\| = 1$ , the condition number has a lower bound of 1. A matrix with a condition number close to 1 is called well-conditioned; one with a large condition number is called ill-conditioned. There is no sharp threshold; the distinction depends on the precision required.

When solving $Ax = b$ in floating-point arithmetic, a relative perturbation of $\delta$ in $b$ can produce a relative error up to $\kappa(A) \cdot \delta$ in $x$ . A condition number of $\kappa$ causes a loss on the order of $\log_{10}(\kappa)$ significant digits. Subtracting from double precision's roughly 15.9 digits gives an approximate estimate:

\text{significant digits} \approx 15.9 - \log_{10}(\kappa)

This is an order-of-magnitude estimate, not a strict lower bound. Actual precision also depends on the problem dimension and algorithm details. When computing $\hat\beta$ in linear regression, the relevant condition number depends on the solver: QR decomposition works directly with $X$ , so $\kappa(X)$ applies; the normal equations solve $(X'X)^{-1}X'Y$ , so $\kappa(X'X) = \kappa(X)^2$ applies. MIDAS uses QR decomposition for coefficient estimation, so the dominant factor in precision is $\kappa(X)$ .

Design Matrix

In the linear regression model $Y = X\beta + \varepsilon$ , $X$ is the design matrix. $Y$ is the response vector, $\beta$ is the coefficient vector, and $\varepsilon$ is the error term. The design matrix has $n$ rows and $p$ columns, with each row corresponding to one observation. When an intercept is included, one of the $p$ columns is a constant column of ones, and the remaining columns correspond to the predictors ( $p$ is the total column count including the intercept).

The OLS (ordinary least squares) estimator minimizes the residual sum of squares; when $X$ has full column rank, it can be written as $\hat\beta = (X'X)^{-1}X'Y$ (see derivation), where $X'$ denotes the transpose of $X$ . A large condition number in the design matrix causes rounding errors to be amplified, reducing the accuracy of computing $\hat\beta$ . When predictors are highly correlated, the design matrix tends to be ill-conditioned.

See OLS Fundamentals for the full mathematical treatment of the design matrix and OLS estimator properties.

Polynomial Regression

In polynomial regression, the columns of the design matrix are $1, x, x^2, \ldots, x^d$ . As the degree $d$ increases, the power columns become strongly correlated and the design matrix becomes ill-conditioned.

The NIST StRD datasets on the Numerical Accuracy page show this effect. The simple regression dataset Norris achieves a coefficient LRE of 12.3, but the 10th degree polynomial Filip drops to 7.3. Filip has $\kappa(X'X) \approx 10^{14}$ ( $\kappa(X) \approx 10^7$ ). Since MIDAS uses QR decomposition for coefficient estimation, $\kappa(X)$ is the relevant quantity. The estimate predicts $15.9 - 7 \approx 8.9$ digits of precision, but the observed LRE is 7.3. The gap reflects factors not captured by the simple formula, such as error accumulation during back-substitution across the problem dimension $p$ (Filip has $p = 11$ ).

To reduce the condition number, replace the monomial basis $1, x, x^2, \ldots$ with an orthogonal polynomial basis. When the design matrix consists only of the intercept and orthogonal polynomial columns, the columns are mutually orthogonal with equal norms, so the condition number is theoretically 1 (in practice it deviates slightly from 1 due to floating-point rounding). The Orthogonal Polynomials tab in MIDAS applies this transformation.