Beyond Normal Linear Models

Author

Mark Andrews

Abstract

The normal linear model makes assumptions about the distribution of residuals and the homogeneity of variance that often fail in practice. We show how to relax these assumptions using robust regression with Student-t distributed residuals and distributional regression, where the variance itself depends on predictor variables.

The limits of the normal model

The standard Bayesian linear model assumes

\[ y_i \sim \mathrm{N}(\mu_i, \sigma^2), \quad \mu_i = \beta_0 + \beta_1 x_{1i} + \cdots + \beta_K x_{Ki} \]

Two assumptions embedded here are that the residuals are normally distributed and that the variance \(\sigma^2\) is constant across all values of the predictors. The first assumption makes the model sensitive to outliers. The second, called homoskedasticity, is frequently violated when the variability of the outcome depends on the predictors.

In a Bayesian framework, relaxing these assumptions is straightforward: we change the distributional family or add a model for the variance parameter.

Robust regression with Student-t residuals

The Student-t distribution has heavier tails than the normal. With small degrees of freedom \(\nu\), extreme values are much more probable than under normality. Replacing the normal likelihood with a Student-t makes the model robust: outlying observations receive downweighted influence because they are less surprising under the heavier-tailed model.

In brms the Student-t family is specified with family = student(). The degrees of freedom parameter \(\nu\) is estimated from the data along with the regression coefficients.

Distributional regression

Distributional regression models the variance parameter \(\sigma\) as a function of predictors, rather than treating it as a constant. This directly addresses heteroskedasticity.

The bf function in brms allows a formula for each distributional parameter. The formula for sigma uses the same predictor variables as the mean formula, but the link function is log (to keep sigma positive).

The model now estimates separate coefficients for how each predictor affects both the mean weight and the variability in weight.

Posterior predictive checks

Posterior predictive checks compare the observed data to data simulated from the fitted model. If the model is adequate, simulated datasets should look similar to the real data.

The pp_check function draws samples from the posterior predictive distribution and overlays their densities on the observed data density. If the normal model underestimates the tails, the Student-t model should show a better match.

Comparing the models

A lower WAIC indicates better estimated out-of-sample predictive accuracy. In data with genuine outliers or heteroskedasticity, the extended model typically wins. The posterior predictive check provides a visual explanation of why.