Glossary of Statistical Terms
Definitions of statistical terms assumed as prerequisites in concepts pages. Terms are listed in alphabetical order.
Asymptotic normality
The property that the distribution of an estimator converges in distribution to a normal distribution as the sample size . Under appropriate normalization,
The above the arrow stands for "distribution." is the asymptotic variance (or the asymptotic covariance matrix when is a vector) and depends on the type of estimator. MLEs possess asymptotic normality under regularity conditions. Even for OLS (Ordinary Least Squares) without normality assumptions, the central limit theorem ensures that converges in distribution to a normal distribution in large samples (OLS Fundamentals).
Consistency
The property that an estimator converges in probability to the true parameter as , written .
Consistency is a basic requirement for estimators: it guarantees that estimates approach the true value as data accumulate. Consistency alone says nothing about estimation precision at finite sample sizes. The OLS estimator is consistent when ( denotes the probability limit) and the probability limit is nonsingular. Homoscedasticity and uncorrelated errors (required by Gauss-Markov) are not needed for consistency (OLS Fundamentals).
Convergence in distribution
A mode of convergence for a sequence of random variables where the distribution approaches another distribution as . Formally, if the distribution functions satisfy at every continuity point of .
The above the arrow stands for "distribution." Convergence in distribution holds as long as the shapes of the distributions of and approach each other; it does not require the values of and themselves to be close. In contrast, convergence in probability requires the values themselves to be close: must become small with high probability. Convergence in probability implies convergence in distribution; the converse holds only when the limit is a constant. Asymptotic normality is defined using this concept.
Convergence in probability
A mode of convergence for a sequence of random variables toward a random variable . For every ,
Written . The above the arrow stands for "probability." When is a constant , this means that the probability deviates from by more than vanishes as grows. Consistency of an estimator is defined as convergence in probability to the true parameter (a constant). The notation (probability limit) is equivalent to .
Delta method
A technique for approximating the variance of a nonlinear function of an estimator. By taking a first-order Taylor expansion of around the true value :
In the multivariate case, use the gradient vector and the variance-covariance matrix : .
The delta method relies on asymptotic normality and may be inaccurate in small samples. The linearization error also grows when has high curvature near or when has large variance. When at the true value, the first-order delta method degenerates and no longer has a normal asymptotic distribution; the second-order delta method (incorporating quadratic terms) is needed instead.
Fieller's method
A method for constructing a confidence interval for the ratio of two parameters. It exploits the (asymptotic) bivariate normality of .
Under the hypothesis that the true ratio is , we have , so the statistic has mean zero and variance
The statistic follows a distribution when the variances and covariance are known, so the confidence set is the collection of for which this quantity does not exceed the critical value (the upper point of ). Rearranging yields a quadratic inequality in :
where , , and . The sign of and the discriminant determine the shape of the confidence set.
- If , the set is a finite interval . The condition is equivalent to , i.e., the Wald test for with the same critical value rejects the null.
- If and , the set is an unbounded union . This happens when the Wald test does not reject .
- If and , the set is the entire real line , meaning no information about .
Unlike the delta method, which linearizes via a Taylor expansion, Fieller's method avoids linearization. When is close to zero, the delta-method approximation breaks down; Fieller's method instead reflects this uncertainty through unbounded or all-real-line confidence sets. It is exact when the estimators are exactly normal and is an approximation under asymptotic normality, as in GLMs. When the variances are estimated from residuals (as in OLS), the original formulation uses (equivalently ) as the critical value rather than (Fieller, 1954).
Deviance
A measure of model fit based on the log-likelihood difference from the saturated model:
The saturated model assigns an individual parameter to each covariate pattern and reproduces the data exactly (its residuals are identically zero). A covariate pattern is a group of observations sharing the same combination of predictor values. For individual-level observations each observation typically forms its own pattern, so the number of parameters equals the number of observations. For data pre-aggregated by covariate pattern (for instance, counts of successes and trials for each pattern), the number of parameters equals the number of patterns. Following McCullagh & Nelder's convention, the quantity defined above is the scaled deviance, and multiplying it by the dispersion parameter gives the unscaled deviance (the quantity commonly called "deviance"). For Poisson and Binomial with , the two coincide. For the Gaussian family, the scaled deviance equals the fitted model's residual sum of squares divided by the error variance, , and the unscaled deviance is itself. MIDAS reports the unscaled form. Deviance generalizes this relationship to any exponential family distribution, with larger values indicating poorer fit. In GLMMs (Generalized Linear Mixed Models), penalized deviance is used for parameter estimation (GLMM Fundamentals).
Estimator
A function of data used to infer an unknown parameter. Since data are random variables, an estimator is itself a random variable that takes different values across samples. The specific numerical value obtained by applying an estimator to observed data is called an estimate.
For example, the sample mean is an estimator of the population mean . The quality of an estimator is evaluated through properties such as consistency, unbiasedness, and asymptotic normality.
Likelihood and log-likelihood
The likelihood is the same formula as the probability density (or mass) function, read as a function of the parameter . For a single observation, ; for independent observations, . While probability varies over possible data for a given parameter, likelihood varies over possible parameters for observed data.
The log-likelihood converts products of independent observations into sums, making numerical computation more tractable. Because is a monotonically increasing function, the that maximizes the likelihood is the same as the that maximizes the log-likelihood.
Parameter estimation in GLMs (Generalized Linear Models) (GLM Fundamentals), Laplace approximation in GLMMs (GLMM Fundamentals), and Cox model partial likelihood (Survival Analysis Fundamentals) are all based on log-likelihood.
Maximum likelihood estimator (MLE)
The parameter value that maximizes the likelihood function: .
When the model is correctly specified and regularity conditions (technical conditions on the smoothness of the likelihood function and the parameter space) hold, MLEs possess consistency, asymptotic normality, and asymptotic efficiency: the asymptotic variance of equals , where is the Fisher information matrix for a single observation. The Cramér-Rao information inequality guarantees in finite samples that the variance of any regular unbiased estimator is at least , the inverse of the Fisher information aggregated over all observations; asymptotic efficiency means that this bound is attained as , with of order . In GLMs, the MLE has no closed-form solution and is computed numerically via IRLS (Iteratively Reweighted Least Squares) (GLM Fundamentals).
Overdispersion
A condition where the observed variance in data exceeds the variance assumed by the model. Poisson and Binomial families assume the dispersion parameter , but real data often exhibit greater variability.
Overdispersion leads to underestimated standard errors and overly narrow confidence intervals. When overdispersion is detected in Poisson models, switching to Negative Binomial explicitly models the extra variance. For Binomial overdispersion, see GLM Fundamentals. Note that when the Binomial trial count is (Bernoulli, i.e., logistic regression), the marginal variance is fully determined by the mean , so individual-level Bernoulli data cannot reveal overdispersion through Pearson or deviance diagnostics. Extra variability stemming from clustering, repeated measures, or unobserved heterogeneity can still arise, but it is handled separately via GLMMs or quasi-likelihood. Classical overdispersion detection and correction is meaningful only for grouped Binomial data with .
Sufficient statistic
A statistic that retains all information in the data about a parameter . Formally, is sufficient for if the conditional distribution of given does not depend on (Fisher-Neyman factorization theorem).
Summarizing data through a sufficient statistic loses no information relevant to estimating . In GLMs with canonical links, is a sufficient statistic for , and the log-likelihood is concave in . When the design matrix has full rank, this guarantees uniqueness of the MLE and stable IRLS convergence (GLM Fundamentals).
Unbiasedness
The property that the expected value of an estimator equals the true parameter: .
The OLS estimator is unbiased under (strict exogeneity). This condition means that is uncorrelated with any measurable function of — a strictly stronger condition than linear uncorrelatedness . Neither (unconditional mean zero) nor alone is sufficient for unbiasedness. With the additional assumptions of homoscedasticity and uncorrelated errors (), the Gauss-Markov theorem guarantees minimum variance among linear unbiased estimators (BLUE: Best Linear Unbiased Estimator) (OLS Fundamentals). MLEs are generally biased in finite samples.
Variance function
In exponential family distributions, the function that determines the mean-variance relationship: . is the second derivative of the log-partition function , expressed as a function of .
The scaling factor takes different forms depending on the family. For Gaussian and Gamma, , the dispersion parameter itself. For Poisson, , a constant. For Binomial, , depending on the per-observation number of trials . Here is the number of trials in the -th observation and is distinct from the overall sample size . In Poisson and Binomial, is fixed at , leaving no room for scaling through .
For Poisson, ; for Binomial, ; for Gamma, (GLM Fundamentals).
References
- Fieller, E. C. (1954). Some problems in interval estimation. Journal of the Royal Statistical Society: Series B, 16(2), 175-185. https://www.jstor.org/stable/2984043