Glossary of Statistical Terms

Definitions of statistical terms assumed as prerequisites in concepts pages. Terms are listed in alphabetical order.

Asymptotic normality

The property that the distribution of an estimator converges in distribution to a normal distribution as the sample size nn \to \infty. Under appropriate normalization,

n(θ^nθ)dN(0,V)\sqrt{n}(\hat\theta_n - \theta) \xrightarrow{d} N(0, V)

VV is the asymptotic variance matrix, which depends on the type of estimator. MLEs possess asymptotic normality under regularity conditions (Casella & Berger, 2002, Ch. 10). For OLS without normality assumptions, the central limit theorem ensures that β^\hat\beta is asymptotically normal in large samples (OLS Fundamentals).

Consistency

The property that an estimator θ^n\hat\theta_n converges in probability to the true parameter θ\theta as nn \to \infty, written θ^npθ\hat\theta_n \xrightarrow{p} \theta.

Consistency is a basic requirement for estimators: it guarantees that estimates approach the true value as data accumulate. Consistency alone says nothing about estimation precision at finite sample sizes. The OLS estimator is consistent under plim(Xε/n)=0\operatorname{plim}(X'\varepsilon/n) = 0 (plim\operatorname{plim} denotes the probability limit). Homoscedasticity and uncorrelated errors (required by Gauss-Markov) are not needed for consistency (OLS Fundamentals).

Convergence in distribution

A mode of convergence for a sequence of random variables XnX_n where the distribution approaches another distribution as nn \to \infty. Formally, XndXX_n \xrightarrow{d} X if the distribution functions satisfy Fn(x)F(x)F_n(x) \to F(x) at every continuity point of FF.

Convergence in probability means "approaching a specific value"; convergence in distribution means "the shape of the distribution approaches a specific distribution." Convergence in probability to a constant implies convergence in distribution to the degenerate distribution at that constant, but not vice versa. Asymptotic normality is defined using this concept.

Convergence in probability

A mode of convergence for a sequence of random variables XnX_n toward a value cc. For every ε>0\varepsilon > 0,

P(Xnc>ε)0(n)P(|X_n - c| > \varepsilon) \to 0 \quad (n \to \infty)

Written XnpcX_n \xrightarrow{p} c. As nn grows, the probability that XnX_n deviates from cc by more than ε\varepsilon vanishes. Consistency of an estimator is defined using this concept. The notation plimXn=c\operatorname{plim} X_n = c is equivalent to XnpcX_n \xrightarrow{p} c.

Delta method

A technique for approximating the variance of a nonlinear function g(θ^)g(\hat\theta) of an estimator. By taking a first-order Taylor expansion of gg around θ^\hat\theta:

Var(g(θ^))g(θ)2Var(θ^)\operatorname{Var}(g(\hat\theta)) \approx g'(\theta)^2 \operatorname{Var}(\hat\theta)

In the multivariate case, use the gradient vector g\nabla g and the variance-covariance matrix Σ\Sigma: gΣg\nabla g^\top \Sigma \, \nabla g.

In dose-response analysis, the delta method is used to construct confidence intervals for nonlinear functions of regression coefficients, such as LD50. For LD50 = exp(β^0/β^1)\exp(-\hat\beta_0 / \hat\beta_1), the partial derivatives of g(β^0,β^1)=β^0/β^1g(\hat\beta_0, \hat\beta_1) = -\hat\beta_0 / \hat\beta_1 and the coefficient variance-covariance matrix yield an approximate variance, from which a confidence interval is computed on the log scale and then exponentiated.

The delta method relies on asymptotic normality and may be inaccurate in small samples. For ratios of parameters, Fieller's method is known to perform better in small samples (Casella & Berger, 2002).

Fieller's method

A method for constructing a confidence interval directly for the ratio ρ=β0/β1\rho = \beta_0 / \beta_1 of two parameters. Unlike the delta method, which linearizes g(θ^)g(\hat\theta), Fieller's method starts from the definition of the ratio and derives an exact confidence region.

Solving β^0ρβ^1=0\hat\beta_0 - \rho \hat\beta_1 = 0 for ρ\rho yields a confidence interval as the solution to a quadratic inequality based on the variance of (β^0ρβ^1)(\hat\beta_0 - \rho \hat\beta_1). Because it does not rely on Taylor approximation, the interval appropriately widens when β^1\hat\beta_1 is close to zero (i.e., when the ratio is unstable).

In dose-response analysis, Fieller's method is used for interval estimation of LD50 = exp(β0/β1)\exp(-\beta_0 / \beta_1) (Fieller, 1954).

Deviance

A measure of model fit based on the log-likelihood difference from the saturated model:

D=2(saturatedmodel)D = 2(\ell_{\text{saturated}} - \ell_{\text{model}})

The saturated model has as many parameters as observations and perfectly reproduces the data. Deviance generalizes the residual sum of squares from OLS to GLMs. Larger deviance indicates poorer model fit. In GLMMs, penalized deviance is used for parameter estimation (GLMM Fundamentals).

Estimator

A function of data used to infer an unknown parameter. Since data are random variables, an estimator is itself a random variable that takes different values across samples. The specific numerical value obtained by applying an estimator to observed data is called an estimate.

For example, the sample mean Xˉ=1ni=1nXi\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i is an estimator of the population mean μ\mu. The quality of an estimator is evaluated through properties such as consistency, unbiasedness, and asymptotic normality.

Likelihood and log-likelihood

The likelihood is the same formula as the probability density (or mass) function, read as a function of the parameter θ\theta. For a single observation, L(θ)=f(yθ)L(\theta) = f(y \mid \theta); for nn independent observations, L(θ)=i=1nf(yiθ)L(\theta) = \prod_{i=1}^n f(y_i \mid \theta). While probability varies over possible data for a given parameter, likelihood varies over possible parameters for observed data.

The log-likelihood (θ)=logL(θ)\ell(\theta) = \log L(\theta) converts products of independent observations into sums, making numerical computation more tractable. Since logarithm is monotone, maximizing likelihood and log-likelihood yield the same result.

GLM parameter estimation (GLM Fundamentals), Laplace approximation in GLMMs (GLMM Fundamentals), and Cox model partial likelihood (Survival Analysis Fundamentals) are all based on log-likelihood.

Maximum likelihood estimator (MLE)

The parameter value that maximizes the likelihood function: θ^ML=argmaxθL(θ;y)\hat\theta_{\text{ML}} = \arg\max_\theta L(\theta; y).

When the model is correctly specified and regularity conditions (technical conditions on the smoothness of the likelihood function and the parameter space) hold, MLEs possess consistency, asymptotic normality, and asymptotic efficiency (the asymptotic variance is minimal among regular consistent estimators) (Casella & Berger, 2002, Ch. 10). In GLMs, the MLE has no closed-form solution and is computed numerically via IRLS (GLM Fundamentals).

Overdispersion

A condition where the observed variance in data exceeds the variance assumed by the model. Poisson and Binomial families assume the dispersion parameter ϕ=1\phi = 1, but real data often exhibit greater variability.

Overdispersion leads to underestimated standard errors and overly narrow confidence intervals. When overdispersion is detected in Poisson models, switching to Negative Binomial explicitly models the extra variance. For Binomial overdispersion, see GLM Fundamentals.

Sufficient statistic

A statistic that retains all information in the data about a parameter θ\theta. Formally, T(X)T(X) is sufficient for θ\theta if the conditional distribution of XX given T(X)T(X) does not depend on θ\theta (Fisher-Neyman factorization theorem).

Summarizing data through a sufficient statistic loses no information relevant to estimating θ\theta. In GLMs with canonical links, XyX'y is a sufficient statistic for β\beta, and the log-likelihood is concave in β\beta. When the design matrix has full rank, this guarantees uniqueness of the MLE and stable IRLS convergence (GLM Fundamentals).

Unbiasedness

The property that the expected value of an estimator equals the true parameter: E[θ^]=θE[\hat\theta] = \theta.

Unbiasedness and consistency are independent properties: an estimator can be unbiased but inconsistent, or consistent but biased in finite samples. The OLS estimator is unbiased under E[ε]=0E[\varepsilon] = 0. With the additional assumptions of homoscedasticity and uncorrelated errors (Var(ε)=σ2I\operatorname{Var}(\varepsilon) = \sigma^2 I), the Gauss-Markov theorem guarantees it has minimum variance among linear unbiased estimators (BLUE). MLEs are generally biased in finite samples but possess consistency and asymptotic efficiency (OLS Fundamentals).

Variance function

In exponential family distributions, the function V(μ)V(\mu) that determines the mean-variance relationship: Var(Y)=V(μ)a(ϕ)\operatorname{Var}(Y) = V(\mu) \cdot a(\phi), where a(ϕ)a(\phi) is a scaling function of the dispersion parameter ϕ\phi. V(μ)V(\mu) is the second derivative of the log-partition function b(θ)b(\theta), expressed as a function of μ\mu.

For Poisson, V(μ)=μV(\mu) = \mu; for Binomial, V(μ)=μ(1μ)V(\mu) = \mu(1-\mu); for Gamma, V(μ)=μ2V(\mu) = \mu^2 (GLM Fundamentals, McCullagh & Nelder, 1989).

References

  • Casella, G., & Berger, R. L. (2002). Statistical Inference (2nd ed.). Duxbury.
  • Fieller, E. C. (1954). Some problems in interval estimation. Journal of the Royal Statistical Society: Series B, 16(2), 175-185.
  • McCullagh, P., & Nelder, J. A. (1989). Generalized Linear Models (2nd ed.). Chapman and Hall/CRC.