The General Linear Model

Mark Andrews

Regression as a probability model

  • Regression is about modelling how the probability distribution of an outcome variable depends on predictor variables
  • It is not just “fitting lines to points”
  • The general linear model is the foundation for all generalized linear models

The normal linear model

For observations \(i = 1, \ldots, n\):

\[ \begin{aligned} y_i &\sim \mathrm{Normal}(\mu_i, \sigma^2)\\ \mu_i &= \beta_0 + \sum_{k=1}^K \beta_k x_{ki} \end{aligned} \]

  • Each \(y_i\) is normally distributed with the same variance \(\sigma^2\)
  • The mean \(\mu_i\) is a linear function of the predictors

Interpreting coefficients

\[ \mu_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \cdots \]

  • \(\beta_0\) is the expected value of \(y\) when all predictors are zero
  • \(\beta_k\) is the expected change in \(y\) for a one-unit increase in \(x_k\), holding other predictors fixed

Estimation

The parameters \(\beta_0, \beta_1, \ldots, \beta_K\) and \(\sigma^2\) are estimated by maximum likelihood.

The maximum likelihood estimators (MLEs) are:

\[ \hat{\beta}_{MLE} = (X^\top X)^{-1} X^\top y, \qquad \hat{\sigma}^2_{MLE} = \frac{1}{n}\sum_{i=1}^n (y_i - \hat{\mu}_i)^2 = \frac{RSS}{n} \]

  • \(\hat{\beta}_{MLE}\) coincides with the ordinary least squares estimator
  • \(\hat{\sigma}^2_{MLE}\) is biased; the unbiased estimator is \(s^2 = RSS/(n-p)\) where \(p\) is the number of parameters

Standard errors are the standard deviations of the sampling distributions of the estimators.

Fitting with lm

weight_df <- read_csv("data/weight.csv")

M_1 <- lm(weight ~ height, data = weight_df)
coef(M_1)
sigma(M_1)   # sqrt(RSS / (n - p)), the unbiased estimator

M_2 <- lm(weight ~ height + age, data = weight_df)
summary(M_2)

Categorical predictors

  • Categorical predictors are re-coded as binary dummy variables
  • For a predictor with \(L\) levels, \(L - 1\) dummies are created
  • The omitted level is the reference category; all other coefficients are differences from it
M_3 <- lm(weight ~ height + age + gender, data = weight_df)
summary(M_3)

Confidence intervals

A 95% confidence interval for \(\beta_k\) is

\[ \hat{\beta}_k \pm t^*_{n-p,\, 0.975} \cdot SE(\hat{\beta}_k) \]

where \(t^*_{n-p,\, 0.975}\) is the 97.5th percentile of the \(t\)-distribution with \(n - p\) degrees of freedom.

confint(M_3)

Predictions

For predictions at new \(\vec{x}^*\), two types of interval are available:

  • Confidence interval on the mean response \(\mu^* = \vec{x}^{*\top}\hat{\beta}\)
  • Prediction interval for a new individual observation (always wider, adds \(\hat{\sigma}^2\) for the new observation’s own variability)
new_df <- tibble(height = c(160, 170, 180), age = 30, gender = "female")
predict(M_3, newdata = new_df, interval = "confidence")
predict(M_3, newdata = new_df, interval = "prediction")

The likelihood at the maximum

The normal linear model’s log-likelihood is

\[ \log L(\beta, \sigma^2 \mid \mathbf{y}) = -\frac{n}{2}\log(2\pi\sigma^2) - \frac{1}{2\sigma^2}\sum_{i=1}^n(y_i - \mu_i)^2 \]

Evaluated at the MLEs \(\hat{\beta}_{MLE}\) and \(\hat{\sigma}^2_{MLE} = RSS/n\), this achieves its maximum value — the highest log-likelihood attainable by any choice of parameters within this model structure.

Model comparison

For nested models \(\mathcal{M}_0 \subset \mathcal{M}_1\), we use the F-test:

\[ F = \frac{(RSS_0 - RSS_1)\,/\,q}{RSS_1\,/\,(n - p)} \sim F(q,\, n-p) \]

where \(q = K_1 - K_0\) is the difference in the number of parameters. This distribution is exact under normality.

The F-test is closely related to the log likelihood ratio statistic:

\[ \Lambda = 2\log\frac{L(\hat{\beta}_1)}{L(\hat{\beta}_0)} \]

anova(M_null, M_full)

AIC

\[ \mathrm{AIC} = -2 \log L(\hat{\beta}) + 2p \]

  • \(p\) is the number of estimated parameters
  • Lower AIC indicates a better balance of fit and complexity
  • Useful for comparing non-nested models
AIC(M_null, M_full)

Summary

  • The normal linear model assumes normally distributed outcomes with constant variance
  • Coefficients are MLEs; sigma() returns the unbiased estimator of \(\sigma\), not the MLE
  • Standard errors are sampling distribution standard deviations; confidence intervals are centred on the MLE \(\pm\) a \(t\)-quantile times the standard error
  • Nested model comparison uses the F-test, which is an exact finite-sample counterpart of the log likelihood ratio statistic \(\Lambda\)