Numerical Accuracy

This page explains how to verify the accuracy of MIDAS statistical computations yourself.

NIST Statistical Reference Datasets

NIST Statistical Reference Datasets (StRD) are benchmark datasets published by the National Institute of Standards and Technology for evaluating the numerical accuracy of statistical software. Each dataset comes with certified values computed to 15 significant digits.

NIST StRD has five categories: Univariate, Linear Regression, Nonlinear Regression, ANOVA, and MCMC. This page covers Univariate, Linear Regression, and ANOVA, which correspond to features available in MIDAS. MIDAS does not have features corresponding to Nonlinear Regression or MCMC.

You are not limited to NIST datasets. Any dataset where you know the expected results from R, Python, or another tool can be used for verification.

Verification

Verify with the UI

  1. Download a CSV file from the table below
  2. Open MIDAS and load the CSV
  3. Follow the steps for the category you want to verify
    • Univariate: Compare the mean and std shown in the Statistics tab
    • Linear Regression: Open a Linear Regression tab, set the Response Variable to y and the Predictor Variables to x, then compare coefficients and R-squared. For the Filip (Orthogonal Poly.) row, first generate degree-10 orthogonal polynomial columns from x using the Orthogonal Polynomials tab, then use those columns as predictors. R² and RMSE are invariant to the choice of basis, so they can be compared directly in the UI (F is not shown in the UI and cannot be compared there). Coefficient and SE values shown in the UI are in the orthogonal basis, so they cannot be directly compared with the NIST certified values (raw polynomial basis). The LRE values for the coefficient and SE columns in the table are computed by the automated test suite.
    • ANOVA: Open an ANOVA tab, set the group column as Factor A and the value column as the Response Variable, then compare the SS shown in the ANOVA Table

How to read the results

The numbers in the tables are Log Relative Error (LRE). LRE is the negated common logarithm of the relative error between the MIDAS result and the NIST certified value, and corresponds to the number of matching significant digits.

LRE=log10computedcertifiedcertified\text{LRE} = -\log_{10} \frac{|\text{computed} - \text{certified}|}{|\text{certified}|}

When the certified value is 0 or the statistic is undefined, this formula cannot be applied. "exact" in the tables indicates that the LRE cannot be computed because the NIST certified value is 0 or the statistic is undefined.

All browser applications, including MIDAS, compute with IEEE 754 double-precision floating-point numbers. The significand stores 52 bits, with an implicit leading bit giving 53 significant bits. The theoretical upper limit for LRE is approximately 15.9. LRE values in the tables are rounded to one decimal place, so values of 15.95 or above appear as 16.0. When the relative error is exactly 0 (an exact match), the formula above diverges, so the MIDAS computation routine returns a finite placeholder value of LRE = 15.

The LRE values in these tables are computed from the MIDAS calculation engine and continuously verified against NIST certified values through automated tests.

Univariate Summary Statistics

The mean and std displayed in the MIDAS Statistics tab were compared against NIST certified values. The standard deviation divides by n1n - 1 (sample standard deviation).

DatasetnLRE(Mean)LRE(SD)
PiDigits500014.915.8
Lottery21815.415.4
Lew2001515.4
Mavro501513.1
Michelso1001513.9
NumAcc131515
NumAcc210011515.6
NumAcc31001159.5
NumAcc41001158.3

NumAcc3 and NumAcc4 have means of 10610^6 to 10710^7 with a standard deviation of 0.1. When subtracting the mean from each value, catastrophic cancellation occurs because the operands are nearly equal, causing a loss of significant digits and reducing the accuracy of the standard deviation.

Dataset details are published at NIST StRD Univariate.

Linear Regression Datasets

Each dataset was run through MIDAS, and the resulting coefficients, standard errors, R-squared, and RMSE were compared against the NIST certified values. The F statistic is not part of the MIDAS regression output, so the automated tests derive it from the R² and degrees of freedom computed by MIDAS and compare that value against the certified values. The numbers in the table are LRE values. For datasets with multiple regression coefficients, the minimum LRE across all coefficients is shown. The same applies to LRE(SE). The minimum is used because it evaluates accuracy based on the least precise estimate. R², RMSE, and F statistic are single values for the entire model and are shown as-is.

DatasetnLRE(Coef.)LRE(SE)LRE(R²)LRE(RMSE)LRE(F)
Norris3612.313.815.513.911.5
Pontius4011.91316139.5
NoInt11114.715.415.715.313.9
NoInt2315.315.81615.514.2
Filip827.37.510.48.27.9
Filip (Orthogonal Poly.)8214.414.81614.713.8
Longley161312.314.312.312
Wampler1219.5exact15exactexact
Wampler22112.6exact15exactexact
Wampler3219.513.61614.411
Wampler4217.813.515.914.815.7
Wampler5215.813.513.714.813.7

Wampler1 and Wampler2 are noise-free data where the model fits the data exactly. "exact" in the table denotes two situations. For LRE(SE) and LRE(RMSE), "exact" means the LRE cannot be computed because the NIST certified values for SE and RMSE are 0. For LRE(F), "exact" means the F statistic is undefined because MSE = 0.

Model specifications, certified values, and dataset descriptions for all 11 datasets are published at NIST StRD Linear Regression.

One-Way ANOVA

Each dataset was run through the MIDAS ANOVA tab, and the resulting between-treatment sum of squares (SS_B) and within-treatment sum of squares (SS_W) were compared against the NIST certified values. The numbers in the table are LRE values. "Difficulty" is the NIST StRD computational difficulty classification, corresponding to the magnitude of the offset in the data values.

DatasetDifficultynGroupsLRE(SS_B)LRE(SS_W)
SiRstvLower25514.113.1
SmLs01Lower189915.915.2
SmLs02Lower1809915.215.2
SmLs03Lower18009913.915.5
AtmWtAgAverage48210.210.9
SmLs04Average189910.110.3
SmLs05Average180999.910.3
SmLs06Average1800999.910.3
SmLs07Higher189944.3
SmLs08Higher180993.94.3
SmLs09Higher1800993.94.3

SiRstv is observed silicon resistivity data (5 groups of 5 observations each). AtmWtAg is observed atomic weight of silver data (2 groups, 48 observations total). SmLs01-SmLs09 are generated datasets with the same structure but different constant offsets, where increasing the number of constant leading digits (3 / 7 / 13) raises the computational difficulty.

Dataset details are published at NIST StRD ANOVA.

Known Limitations

Filip: A 10th degree polynomial with a design matrix condition number κ(X)2×1015\kappa(X) \approx 2 \times 10^{15} (the κ(XX)\kappa(X'X) appearing in the normal equations is its square, about 103010^{30}). The larger the condition number, the more rounding errors affect the result. With the raw polynomial basis, the coefficient accuracy is 7 significant digits. Generating orthogonal polynomial columns with Orthogonal Polynomials and using them as predictors reduces the condition number to approximately 1, improving coefficient accuracy to 10 or more significant digits.

Wampler5: Wampler1-5 all use the same design matrix, so their condition numbers are identical. Wampler5 has the largest noise and lowest signal-to-noise ratio, so rounding errors due to the condition number have a proportionally larger effect on the coefficient estimates. The coefficient accuracy is 6 significant digits.

SmLs07-SmLs09 (ANOVA Higher): Data values are on the order of 101210^{12} with variation only in the decimal places. The between-treatment sum of squares (SS_B) is derived indirectly as SS_total − SS_within. For SmLs07-09, SS_total and SS_within are nearly equal, so this subtraction loses significant digits (catastrophic cancellation). SS_total and SS_within are each computed using the Welford online algorithm, which mitigates the effect of the large offset, but the cancellation in the subtraction cannot be avoided. The accuracy of SS_B is approximately 4 significant digits. The within-treatment sum of squares (SS_W) is accumulated from each group's Welford variance, which mitigates the effect of the large offset, yielding approximately 4 significant digits.

Data Source

National Institute of Standards and Technology. (1999). Statistical Reference Datasets. Standard Reference Database 140. https://doi.org/10.18434/T43G6C