Numerical Accuracy

This page explains how to verify the accuracy of MIDAS statistical computations yourself.

NIST Statistical Reference Datasets

NIST Statistical Reference Datasets (StRD) are benchmark datasets published by the National Institute of Standards and Technology for evaluating the numerical accuracy of statistical software. Each dataset comes with certified values computed to 15 significant digits.

NIST StRD has five categories: Univariate, Linear Regression, Nonlinear Regression, ANOVA, and MCMC. This page covers Univariate, Linear Regression, and ANOVA, which correspond to features available in MIDAS. MIDAS does not have features corresponding to Nonlinear Regression or MCMC.

You are not limited to NIST datasets. Any dataset where you know the expected results from R, Python, or another tool can be used for verification.

Verification

Verify with the UI

Download a CSV file from the table below
Open MIDAS and load the CSV
Follow the steps for the category you want to verify
- Univariate: Compare the mean and standard deviation shown in the Data Table panel
- Linear Regression: Open a Linear Regression tab, set the response variable y and explanatory variables x, then compare coefficients and R-squared
- ANOVA: Open an ANOVA tab, set the group column as the factor and the value column as the response, then compare SS and F statistic

How to read the results

The numbers in the tables are Log Relative Error (LRE). LRE is the negated common logarithm of the relative error between the MIDAS result and the NIST certified value, and corresponds to the number of matching significant digits.

\text{LRE} = -\log_{10} \frac{|\text{computed} - \text{certified}|}{|\text{certified}|}

When the certified value is 0 or the statistic is undefined, this formula cannot be applied. "exact" in the tables indicates that both the computed and certified values are 0, or that the statistic is undefined and both values are missing.

All browser applications, including MIDAS, compute with IEEE 754 double-precision floating-point numbers. The significand stores 52 bits, with an implicit leading bit giving 53 significant bits. The theoretical upper limit for LRE is approximately 15.9. LRE values in the tables are rounded to one decimal place, so values of 15.95 or above appear as 16.0.

The LRE values in these tables are computed from the MIDAS calculation engine and continuously verified against NIST certified values through automated tests.

Univariate Summary Statistics

The mean and standard deviation displayed in the MIDAS Data Table panel were compared against NIST certified values. The standard deviation divides by $n - 1$ (sample standard deviation).

Dataset	n	LRE(Mean)	LRE(SD)
PiDigits	5000	15	14.9
Lottery	218	15.2	15.7
Lew	200	15	15.2
Mavro	50	15	13.1
Michelso	100	15	13.9
NumAcc1	3	15	15
NumAcc2	1001	15	14.2
NumAcc3	1001	15.9	9.5
NumAcc4	1001	15.7	8.3

NumAcc3 and NumAcc4 have means of $10^6$ to $10^7$ with a standard deviation of 0.1. When subtracting the mean from each value, catastrophic cancellation occurs because the operands are nearly equal, causing a loss of significant digits and reducing the accuracy of the standard deviation.

Dataset details are published at NIST StRD Univariate.

Linear Regression Datasets

Each dataset was run through MIDAS, and the resulting coefficients, standard errors, R-squared, residual SD, and F statistic were compared against the NIST certified values. The numbers in the table are LRE values. For datasets with multiple regression coefficients, the minimum LRE across all coefficients is shown. The same applies to LRE(SE). The minimum is used because it evaluates accuracy based on the least precise estimate. R², residual SD, and F statistic are single values for the entire model and are shown as-is.

Dataset	n	LRE(Coef.)	LRE(SE)	LRE(R²)	LRE(Resid. SD)	LRE(F)
Norris	36	12.3	13.8	15.5	13.9	11.5
Pontius	40	11.9	13	16	13	9.5
NoInt1	11	14.7	15.4	15.7	15.3	13.9
NoInt2	3	15.3	15.8	16	15.5	14.2
Filip	82	7.3	7.5	10.4	8.2	7.9
Longley	16	13	12.3	14.3	12.3	12
Wampler1	21	9.5	exact	15	exact	exact
Wampler2	21	12.6	exact	15	exact	exact
Wampler3	21	9.5	13.6	16	14.4	11
Wampler4	21	7.8	13.5	15.9	14.8	15.7
Wampler5	21	5.8	13.5	13.7	14.8	13.7

Wampler1 and Wampler2 are noise-free data where the model fits the data exactly. All residuals are zero, so SE and residual SD are zero and the computed values match the certified values. The F statistic is undefined because MSE = 0; MIDAS returns null. "exact" in the table indicates these exact matches and undefined statistics.

Model specifications, certified values, and dataset descriptions for all 11 datasets are published at NIST StRD Linear Regression.

One-Way ANOVA

Each dataset was run through the MIDAS ANOVA tab, and the resulting between-treatment sum of squares (SS_B), within-treatment sum of squares (SS_W), and F statistic were compared against the NIST certified values. The numbers in the table are LRE values.

Dataset	Difficulty	n	Groups	LRE(SS_B)	LRE(SS_W)	LRE(F)
SiRstv	Lower	25	5	13.4	13.1	12.9
SmLs01	Lower	189	9	15.6	15.2	15
SmLs02	Lower	1809	9	15.4	15.2	14.9
SmLs03	Lower	18009	9	15.3	15.5	15.9
AtmWtAg	Average	48	2	11	10.9	11.7
SmLs04	Average	189	9	9.3	10.3	9.3
SmLs05	Average	1809	9	9.3	10.3	9.3
SmLs06	Average	18009	9	9.3	10.3	9.3
SmLs07	Higher	189	9	3.3	4.3	3.3
SmLs08	Higher	1809	9	3.3	4.3	3.3
SmLs09	Higher	18009	9	3.3	4.3	3.3

SiRstv is observed silicon resistivity data (5 groups of 5 observations each). AtmWtAg is observed atomic weight of silver data (2 groups, 48 observations total). SmLs01-SmLs09 are generated datasets with the same structure but different constant offsets, where increasing the number of constant leading digits (3 / 7 / 13) raises the computational difficulty.

Dataset details are published at NIST StRD ANOVA.

Known Limitations

Filip: A 10th degree polynomial with a design matrix condition number $\kappa(X'X) \approx 10^{14}$ ( $\kappa(X) \approx 10^7$ ). The larger the condition number, the more rounding errors affect the result. With the raw polynomial basis, the coefficient accuracy is 7 significant digits. Generating orthogonal polynomial columns with Orthogonal Polynomials and using them as predictors reduces the condition number to approximately 1, improving coefficient accuracy to 10 or more significant digits.

Wampler5: Wampler1-5 all use the same design matrix, so their condition numbers are identical. Wampler5 has the largest noise and lowest signal-to-noise ratio, so rounding errors due to the condition number have a proportionally larger effect on the coefficient estimates. The coefficient accuracy is 6 significant digits.

SmLs07-SmLs09 (ANOVA Higher): Data values are on the order of $10^{12}$ with variation only in the decimal places. With 13 constant leading digits, catastrophic cancellation occurs when computing the difference between group means and the grand mean $\bar{y}_i - \bar{y}$ . The accuracy of the between-treatment sum of squares (SS_B) and F statistic is limited to approximately 3 significant digits. The within-treatment sum of squares (SS_W) achieves approximately 4 significant digits because the Welford online algorithm computes each group's variation from deviations within the group, which are unaffected by the large offset.

Data Source

National Institute of Standards and Technology. (1999). Statistical Reference Datasets. Standard Reference Database 140. https://doi.org/10.18434/T43G6C