Bayesian Generalised Linear Models

Author

Mark Andrews

Abstract

Bayesian generalised linear models extend the regression framework to non-normal response variables. We focus on binary logistic regression using the Bernoulli family in brms, covering the logit link, coefficient interpretation, and posterior predictive checks, with a brief look at Poisson and other families.

From linear to generalised linear models

The normal linear model is

\[ y_i \sim \mathrm{N}(\mu_i, \sigma^2), \quad \mu_i = \beta_0 + \sum_{k=1}^{K} \beta_k x_{ki} \]

Generalised linear models keep the linear predictor but allow a different distributional family for the response and a link function \(g\) that connects the mean to the linear predictor:

\[ g(\mu_i) = \beta_0 + \sum_{k=1}^{K} \beta_k x_{ki} \]

For binary outcomes, \(\mu_i\) is the probability \(\theta_i\) of a positive result. The logit link is \(g(\theta) = \log[\theta / (1 - \theta)]\), which maps probabilities to the real line. The inverse logit (logistic) function maps back:

\[ \theta_i = \mathrm{ilogit}(\beta_0 + \beta_1 x_i) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_i)}} \]

Logistic regression with brms

Fitting a Bayesian logistic regression requires only changing the family argument.

The bernoulli family in brms is the appropriate choice for binary outcomes coded as 0 and 1. The binomial family is used when the response is a count of successes with a known number of trials.

Interpreting coefficients

Coefficients in logistic regression are on the log-odds scale. A coefficient of \(\beta_k = 0.3\) for predictor \(x_k\) means that a one-unit increase in \(x_k\) multiplies the odds of success by \(e^{0.3} \approx 1.35\). To translate to a change in probability requires specifying the baseline value of the other predictors, since the logistic function is not linear.

The posterior distribution over each coefficient is interpreted in the same way as in linear regression. A 95% credible interval that excludes zero is strong evidence that the predictor has an effect on the log-odds.

Prior summary for GLMs

For logistic regression, the default priors in brms are placed on the log-odds scale. The default for regression coefficients is typically a Student-t or normal distribution centred at zero. Inspecting these defaults with prior_summary is important because the scale of the log-odds can be non-intuitive.

Other GLM families

The same Bayesian workflow applies to other GLM families. The model specification, prior inspection, MCMC diagnostics, and model comparison are identical. Only the family and link function change.

Common families in brms:

poisson() for count data, with log link
negbinomial() for overdispersed counts
cumulative() for ordered categorical outcomes (ordinal regression)
zero_inflated_poisson() for count data with excess zeros

For any of these, fitting the model looks the same:

Posterior predictive checks for GLMs

Posterior predictive checks work for all GLM families. For binary outcomes, the default pp_check compares the proportion of ones in the observed data to the proportion in replicated datasets drawn from the posterior predictive distribution.