GLM Fundamentals

This page covers the statistical theory behind the GLM tab. See that page for usage instructions.

Model Formulation

GLM generalizes the normal linear model to the exponential family of distributions, as introduced by Nelder & Wedderburn (1972). A GLM is defined by three components:

  1. Distribution family: The response variable YY follows a distribution in the exponential family
  2. Linear predictor: η=Xβ\eta = X\beta (a linear combination of explanatory variables)
  3. Link function: A monotonic function gg such that η=g(μ)\eta = g(\mu), connecting the linear predictor to the mean μ=E[Y]\mu = E[Y]

OLS is a special case of GLM (Gaussian family with identity link). In this case, IRLS converges in a single iteration to the normal equations solution, and the Wald test is equivalent to the OLS tt-test.

Exponential Family

A family of distributions is called an exponential family if its density (or mass) function can be written as:

f(yθ,ϕ)=exp ⁣{yθb(θ)a(ϕ)+c(y,ϕ)}f(y \mid \theta, \phi) = \exp\!\left\{\frac{y\theta - b(\theta)}{a(\phi)} + c(y, \phi)\right\}

where θ\theta is the natural (canonical) parameter, ϕ\phi is the dispersion parameter, and b(θ)b(\theta) is the log-partition function. The mean and variance are derived from b(θ)b(\theta):

  • E[Y]=b(θ)=μE[Y] = b'(\theta) = \mu
  • Var(Y)=b(θ)a(ϕ)\operatorname{Var}(Y) = b''(\theta) \cdot a(\phi)

Rewriting b(θ)b''(\theta) as a function of μ\mu rather than θ\theta gives the variance function V(μ)V(\mu), so Var(Y)=V(μ)a(ϕ)\operatorname{Var}(Y) = V(\mu) \cdot a(\phi). For example, Poisson has b(θ)=eθb(\theta) = e^\theta, giving b(θ)=eθ=μb'(\theta) = e^\theta = \mu and b(θ)=eθ=μb''(\theta) = e^\theta = \mu, hence V(μ)=μV(\mu) = \mu.

Exponential family parameters for each distribution family:

Familyθ\theta (natural parameter)a(ϕ)a(\phi)b(θ)b(\theta)c(y,ϕ)c(y, \phi)
Gaussianμ\muϕ\phiθ2/2\theta^2/2(y2/(2ϕ)+log(2πϕ)/2)-\bigl(y^2/(2\phi) + \log(2\pi\phi)/2\bigr)
Binomiallog ⁣(μ/(1μ))\log\!\bigl(\mu/(1-\mu)\bigr)1/n1/nlog(1+eθ)\log(1+e^\theta)log(nk)\log\binom{n}{k}
Poissonlogμ\log\mu11eθe^\thetalog(y!)-\log(y!)
Gamma1/μ-1/\muϕ\philog(θ)-\log(-\theta)(1/ϕ1)logy+(1/ϕ)log(1/ϕ)logΓ(1/ϕ)(1/\phi - 1)\log y + (1/\phi)\log(1/\phi) - \log\Gamma(1/\phi)
Negative Binomiallog ⁣(μ/(μ+r))\log\!\bigl(\mu/(\mu+r)\bigr)11rlog(1eθ)-r\log(1-e^\theta)logΓ(y+r)logΓ(r)log(y!)\log\Gamma(y+r) - \log\Gamma(r) - \log(y!)
  • In the Binomial row, yy is the proportion of successes (y=k/ny = k/n, 0y10 \le y \le 1), kk is the number of successes, nn is the number of trials, and μ\mu is the success probability. When n=1n=1, it reduces to the Bernoulli distribution
  • The rr in Negative Binomial corresponds to the shape parameter θ\theta in the MIDAS UI. The Negative Binomial belongs to the exponential family only when rr is known. In MIDAS's automatic estimation mode, rr is estimated in an outer loop
  • The θ\theta in this table is the exponential family natural parameter, which is distinct from the Negative Binomial shape parameter θ\theta in the MIDAS UI

The link function is a monotonic function η=g(μ)\eta = g(\mu) connecting the linear predictor η\eta to the expected value μ\mu of the response. A link function satisfying g(μ)=θg(\mu) = \theta (the natural parameter) is called the canonical link.

Link FunctionFormulaCanonical Link For
Identityη=μ\eta = \muGaussian
Logitη=log ⁣(μ/(1μ))\eta = \log\!\bigl(\mu / (1 - \mu)\bigr)Binomial
Logη=log(μ)\eta = \log(\mu)Poisson, Negative Binomial
Inverseη=1/μ\eta = 1/\muGamma
Probitη=Φ1(μ)\eta = \Phi^{-1}(\mu)

The canonical link has important properties: since η=θ\eta = \theta, XyX'y becomes a sufficient statistic for β\beta, and the log-likelihood is concave in β\beta. When the design matrix XX has full rank, this guarantees uniqueness of the MLE and stable IRLS convergence.

Non-canonical links forfeit these properties but may be chosen for easier coefficient interpretation. For example, the canonical link for Gamma is Inverse (η=1/μ\eta = 1/\mu), which puts coefficients on a 1/μ1/\mu scale that is hard to interpret. The Log link (exp(β)\exp(\beta) as a multiplicative effect) is more commonly used in practice.

Parameter Estimation (IRLS)

GLM parameters β\beta are estimated by maximum likelihood. Since no closed-form solution exists, IRLS (Iteratively Reweighted Least Squares) is used.

At each iteration, working weights WW and an adjusted dependent variable zz are computed, then the weighted least squares problem:

β^(t+1)=(XW(t)X)1XW(t)z(t)\hat\beta^{(t+1)} = (X'W^{(t)}X)^{-1}X'W^{(t)}z^{(t)}

is solved to update β\beta. See McCullagh & Nelder (1989, Ch. 2) for the derivation of WW and zz. Iteration stops when the maximum absolute change in coefficients falls below the convergence threshold.

With the canonical link, the concavity of the log-likelihood ensures stable convergence. Non-canonical links may lead to slower convergence or convergence failure.

Variance Functions and Overdispersion

As described in the Exponential Family section, the variance function V(μ)=b(θ)V(\mu) = b''(\theta) is the second derivative of the log-partition function rewritten in terms of μ\mu. Through the relationship Var(Y)=V(μ)a(ϕ)\operatorname{Var}(Y) = V(\mu) \cdot a(\phi), it defines the mean-variance relationship for each family.

Familyb(θ)b''(\theta)V(μ)V(\mu)a(ϕ)a(\phi)Var(Y)\operatorname{Var}(Y)
Gaussian1111ϕ\phiϕ\phi (= σ2\sigma^2)
Binomialeθ(1+eθ)2\dfrac{e^\theta}{(1+e^\theta)^2}μ(1μ)\mu(1 - \mu)1/n1/nμ(1μ)/n\mu(1-\mu)/n
Poissoneθe^\thetaμ\mu11μ\mu
Gamma1/θ21/\theta^2μ2\mu^2ϕ\phiμ2ϕ\mu^2 \phi
Negative Binomialreθ(1eθ)2\dfrac{re^\theta}{(1-e^\theta)^2}μ+μ2/r\mu + \mu^2/r11μ+μ2/r\mu + \mu^2/r

Poisson and Binomial assume a dispersion parameter ϕ=1\phi = 1. When the actual data variance exceeds this assumption, the condition is called overdispersion. Overdispersion leads to underestimated standard errors and confidence intervals that are too narrow.

When overdispersion is detected with Poisson data, switching to Negative Binomial adds a μ2/r\mu^2/r term to the variance, explicitly modeling the extra dispersion.

For Binomial overdispersion, MIDAS does not currently support quasi-binomial or Beta-Binomial alternatives. When overdispersion is suspected, check the estimated dispersion parameter and consider that standard errors and confidence intervals may be underestimated.

See also

References

  • Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society: Series A, 135(3), 370-384.
  • McCullagh, P., & Nelder, J. A. (1989). Generalized Linear Models (2nd ed.). Chapman and Hall/CRC.