Glossary of Statistical Terms
Definitions of statistical terms assumed as prerequisites in concepts pages. Terms are listed in alphabetical order.
Asymptotic normality
The property that the distribution of an estimator converges in distribution to a normal distribution as the sample size . Under appropriate normalization,
is the asymptotic variance matrix, which depends on the type of estimator. MLEs possess asymptotic normality under regularity conditions (Casella & Berger, 2002, Ch. 10). For OLS without normality assumptions, the central limit theorem ensures that is asymptotically normal in large samples (OLS Fundamentals).
Consistency
The property that an estimator converges in probability to the true parameter as , written .
Consistency is a basic requirement for estimators: it guarantees that estimates approach the true value as data accumulate. Consistency alone says nothing about estimation precision at finite sample sizes. The OLS estimator is consistent under ( denotes the probability limit). Homoscedasticity and uncorrelated errors (required by Gauss-Markov) are not needed for consistency (OLS Fundamentals).
Convergence in distribution
A mode of convergence for a sequence of random variables where the distribution approaches another distribution as . Formally, if the distribution functions satisfy at every continuity point of .
Convergence in probability means "approaching a specific value"; convergence in distribution means "the shape of the distribution approaches a specific distribution." Convergence in probability to a constant implies convergence in distribution to the degenerate distribution at that constant, but not vice versa. Asymptotic normality is defined using this concept.
Convergence in probability
A mode of convergence for a sequence of random variables toward a value . For every ,
Written . As grows, the probability that deviates from by more than vanishes. Consistency of an estimator is defined using this concept. The notation is equivalent to .
Delta method
A technique for approximating the variance of a nonlinear function of an estimator. By taking a first-order Taylor expansion of around :
In the multivariate case, use the gradient vector and the variance-covariance matrix : .
In dose-response analysis, the delta method is used to construct confidence intervals for nonlinear functions of regression coefficients, such as LD50. For LD50 = , the partial derivatives of and the coefficient variance-covariance matrix yield an approximate variance, from which a confidence interval is computed on the log scale and then exponentiated.
The delta method relies on asymptotic normality and may be inaccurate in small samples. For ratios of parameters, Fieller's method is known to perform better in small samples (Casella & Berger, 2002).
Fieller's method
A method for constructing a confidence interval directly for the ratio of two parameters. Unlike the delta method, which linearizes , Fieller's method starts from the definition of the ratio and derives an exact confidence region.
Solving for yields a confidence interval as the solution to a quadratic inequality based on the variance of . Because it does not rely on Taylor approximation, the interval appropriately widens when is close to zero (i.e., when the ratio is unstable).
In dose-response analysis, Fieller's method is used for interval estimation of LD50 = (Fieller, 1954).
Deviance
A measure of model fit based on the log-likelihood difference from the saturated model:
The saturated model has as many parameters as observations and perfectly reproduces the data. Deviance generalizes the residual sum of squares from OLS to GLMs. Larger deviance indicates poorer model fit. In GLMMs, penalized deviance is used for parameter estimation (GLMM Fundamentals).
Estimator
A function of data used to infer an unknown parameter. Since data are random variables, an estimator is itself a random variable that takes different values across samples. The specific numerical value obtained by applying an estimator to observed data is called an estimate.
For example, the sample mean is an estimator of the population mean . The quality of an estimator is evaluated through properties such as consistency, unbiasedness, and asymptotic normality.
Likelihood and log-likelihood
The likelihood is the same formula as the probability density (or mass) function, read as a function of the parameter . For a single observation, ; for independent observations, . While probability varies over possible data for a given parameter, likelihood varies over possible parameters for observed data.
The log-likelihood converts products of independent observations into sums, making numerical computation more tractable. Since logarithm is monotone, maximizing likelihood and log-likelihood yield the same result.
GLM parameter estimation (GLM Fundamentals), Laplace approximation in GLMMs (GLMM Fundamentals), and Cox model partial likelihood (Survival Analysis Fundamentals) are all based on log-likelihood.
Maximum likelihood estimator (MLE)
The parameter value that maximizes the likelihood function: .
When the model is correctly specified and regularity conditions (technical conditions on the smoothness of the likelihood function and the parameter space) hold, MLEs possess consistency, asymptotic normality, and asymptotic efficiency (the asymptotic variance is minimal among regular consistent estimators) (Casella & Berger, 2002, Ch. 10). In GLMs, the MLE has no closed-form solution and is computed numerically via IRLS (GLM Fundamentals).
Overdispersion
A condition where the observed variance in data exceeds the variance assumed by the model. Poisson and Binomial families assume the dispersion parameter , but real data often exhibit greater variability.
Overdispersion leads to underestimated standard errors and overly narrow confidence intervals. When overdispersion is detected in Poisson models, switching to Negative Binomial explicitly models the extra variance. For Binomial overdispersion, see GLM Fundamentals.
Sufficient statistic
A statistic that retains all information in the data about a parameter . Formally, is sufficient for if the conditional distribution of given does not depend on (Fisher-Neyman factorization theorem).
Summarizing data through a sufficient statistic loses no information relevant to estimating . In GLMs with canonical links, is a sufficient statistic for , and the log-likelihood is concave in . When the design matrix has full rank, this guarantees uniqueness of the MLE and stable IRLS convergence (GLM Fundamentals).
Unbiasedness
The property that the expected value of an estimator equals the true parameter: .
Unbiasedness and consistency are independent properties: an estimator can be unbiased but inconsistent, or consistent but biased in finite samples. The OLS estimator is unbiased under . With the additional assumptions of homoscedasticity and uncorrelated errors (), the Gauss-Markov theorem guarantees it has minimum variance among linear unbiased estimators (BLUE). MLEs are generally biased in finite samples but possess consistency and asymptotic efficiency (OLS Fundamentals).
Variance function
In exponential family distributions, the function that determines the mean-variance relationship: , where is a scaling function of the dispersion parameter . is the second derivative of the log-partition function , expressed as a function of .
For Poisson, ; for Binomial, ; for Gamma, (GLM Fundamentals, McCullagh & Nelder, 1989).
References
- Casella, G., & Berger, R. L. (2002). Statistical Inference (2nd ed.). Duxbury.
- Fieller, E. C. (1954). Some problems in interval estimation. Journal of the Royal Statistical Society: Series B, 16(2), 175-185.
- McCullagh, P., & Nelder, J. A. (1989). Generalized Linear Models (2nd ed.). Chapman and Hall/CRC.