Numerical Accuracy

This page explains how to verify the accuracy of MIDAS statistical computations yourself.

NIST Statistical Reference Datasets

NIST Statistical Reference Datasets (StRD) are benchmark datasets published by the National Institute of Standards and Technology for evaluating the numerical accuracy of statistical software. Each dataset comes with certified values computed to 15 significant digits.

NIST StRD has five categories: Univariate, Linear Regression, Nonlinear Regression, ANOVA, and MCMC. This page covers Univariate, Linear Regression, and ANOVA, which correspond to features available in MIDAS. MIDAS does not have features corresponding to Nonlinear Regression or MCMC.

You are not limited to NIST datasets. Any dataset where you know the expected results from R, Python, or another tool can be used for verification.

Verification

Verify with the UI

  1. Download a CSV file from the table below
  2. Open MIDAS and load the CSV
  3. Follow the steps for the category you want to verify
    • Univariate: Compare the mean and standard deviation shown in the Data Table panel
    • Linear Regression: Open a Linear Regression tab, set the response variable y and explanatory variables x, then compare coefficients and R-squared
    • ANOVA: Open an ANOVA tab, set the group column as the factor and the value column as the response, then compare SS and F statistic

How to read the results

The numbers in the tables are Log Relative Error (LRE). LRE is the negated common logarithm of the relative error between the MIDAS result and the NIST certified value, and corresponds to the number of matching significant digits.

LRE=log10computedcertifiedcertified\text{LRE} = -\log_{10} \frac{|\text{computed} - \text{certified}|}{|\text{certified}|}

When the certified value is 0 or the statistic is undefined, this formula cannot be applied. "exact" in the tables indicates that both the computed and certified values are 0, or that the statistic is undefined and both values are missing.

All browser applications, including MIDAS, compute with IEEE 754 double-precision floating-point numbers. The significand stores 52 bits, with an implicit leading bit giving 53 significant bits. The theoretical upper limit for LRE is approximately 15.9. LRE values in the tables are rounded to one decimal place, so values of 15.95 or above appear as 16.0.

The LRE values in these tables are computed from the MIDAS calculation engine and continuously verified against NIST certified values through automated tests.

Univariate Summary Statistics

The mean and standard deviation displayed in the MIDAS Data Table panel were compared against NIST certified values. The standard deviation divides by n1n - 1 (sample standard deviation).

DatasetnLRE(Mean)LRE(SD)
PiDigits50001514.9
Lottery21815.215.7
Lew2001515.2
Mavro501513.1
Michelso1001513.9
NumAcc131515
NumAcc210011514.2
NumAcc3100115.99.5
NumAcc4100115.78.3

NumAcc3 and NumAcc4 have means of 10610^6 to 10710^7 with a standard deviation of 0.1. When subtracting the mean from each value, catastrophic cancellation occurs because the operands are nearly equal, causing a loss of significant digits and reducing the accuracy of the standard deviation.

Dataset details are published at NIST StRD Univariate.

Linear Regression Datasets

Each dataset was run through MIDAS, and the resulting coefficients, standard errors, R-squared, residual SD, and F statistic were compared against the NIST certified values. The numbers in the table are LRE values. For datasets with multiple regression coefficients, the minimum LRE across all coefficients is shown. The same applies to LRE(SE). The minimum is used because it evaluates accuracy based on the least precise estimate. R², residual SD, and F statistic are single values for the entire model and are shown as-is.

DatasetnLRE(Coef.)LRE(SE)LRE(R²)LRE(Resid. SD)LRE(F)
Norris3612.313.815.513.911.5
Pontius4011.91316139.5
NoInt11114.715.415.715.313.9
NoInt2315.315.81615.514.2
Filip827.37.510.48.27.9
Longley161312.314.312.312
Wampler1219.5exact15exactexact
Wampler22112.6exact15exactexact
Wampler3219.513.61614.411
Wampler4217.813.515.914.815.7
Wampler5215.813.513.714.813.7

Wampler1 and Wampler2 are noise-free data where the model fits the data exactly. All residuals are zero, so SE and residual SD are zero and the computed values match the certified values. The F statistic is undefined because MSE = 0; MIDAS returns null. "exact" in the table indicates these exact matches and undefined statistics.

Model specifications, certified values, and dataset descriptions for all 11 datasets are published at NIST StRD Linear Regression.

One-Way ANOVA

Each dataset was run through the MIDAS ANOVA tab, and the resulting between-treatment sum of squares (SS_B), within-treatment sum of squares (SS_W), and F statistic were compared against the NIST certified values. The numbers in the table are LRE values.

DatasetDifficultynGroupsLRE(SS_B)LRE(SS_W)LRE(F)
SiRstvLower25513.413.112.9
SmLs01Lower189915.615.215
SmLs02Lower1809915.415.214.9
SmLs03Lower18009915.315.515.9
AtmWtAgAverage4821110.911.7
SmLs04Average18999.310.39.3
SmLs05Average180999.310.39.3
SmLs06Average1800999.310.39.3
SmLs07Higher18993.34.33.3
SmLs08Higher180993.34.33.3
SmLs09Higher1800993.34.33.3

SiRstv is observed silicon resistivity data (5 groups of 5 observations each). AtmWtAg is observed atomic weight of silver data (2 groups, 48 observations total). SmLs01-SmLs09 are generated datasets with the same structure but different constant offsets, where increasing the number of constant leading digits (3 / 7 / 13) raises the computational difficulty.

Dataset details are published at NIST StRD ANOVA.

Known Limitations

Filip: A 10th degree polynomial with a design matrix condition number κ(XX)1014\kappa(X'X) \approx 10^{14} (κ(X)107\kappa(X) \approx 10^7). The larger the condition number, the more rounding errors affect the result. With the raw polynomial basis, the coefficient accuracy is 7 significant digits. Generating orthogonal polynomial columns with Orthogonal Polynomials and using them as predictors reduces the condition number to approximately 1, improving coefficient accuracy to 10 or more significant digits.

Wampler5: Wampler1-5 all use the same design matrix, so their condition numbers are identical. Wampler5 has the largest noise and lowest signal-to-noise ratio, so rounding errors due to the condition number have a proportionally larger effect on the coefficient estimates. The coefficient accuracy is 6 significant digits.

SmLs07-SmLs09 (ANOVA Higher): Data values are on the order of 101210^{12} with variation only in the decimal places. With 13 constant leading digits, catastrophic cancellation occurs when computing the difference between group means and the grand mean yˉiyˉ\bar{y}_i - \bar{y}. The accuracy of the between-treatment sum of squares (SS_B) and F statistic is limited to approximately 3 significant digits. The within-treatment sum of squares (SS_W) achieves approximately 4 significant digits because the Welford online algorithm computes each group's variation from deviations within the group, which are unaffected by the large offset.

Data Source

National Institute of Standards and Technology. (1999). Statistical Reference Datasets. Standard Reference Database 140. https://doi.org/10.18434/T43G6C