Bayesian Mixed Effects Models
Bayesian approaches to multilevel and mixed effects models for grouped or correlated data. We fit varying intercept and varying slope models using brms, compare with lme4, and discuss why Bayesian mixed models often converge where classical approaches struggle.
When are mixed models needed?
Mixed effects models are appropriate whenever observations are grouped and the groups are themselves a sample from a larger population. Students within schools, patients within hospitals, repeated measures within individuals, plots within regions: in all these cases, observations within the same group are correlated. Ignoring this correlation leads to underestimated standard errors and overconfident inferences.
The key feature of a mixed effects model is that the group-level parameters, the varying intercepts and slopes, are treated as random draws from a population distribution rather than as fixed unknowns. This population distribution is estimated from the data and allows partial pooling: estimates for each group are pulled toward the overall mean, with the amount of shrinkage determined by how much variance there is between groups relative to within groups.
Varying intercept model
In a varying intercept model, each group has its own baseline level but the effects of predictors are the same across groups.
The (ses | school) term specifies that both the intercept and the effect of ses vary by school. This is a varying slopes model. In lme4 and brms the formula syntax is identical. The output differs: lme4 provides maximum likelihood estimates, brms provides a posterior distribution over all model parameters including the population-level variance components.
Varying intercepts and varying slopes
A model with (ses | school) allows the intercept and the slope of ses to be different for each school, and estimates the correlation between them. A model with (1 | school) allows only the intercept to vary. A model with (0 + ses | school) allows only the slope to vary.
These choices have practical consequences. Varying slopes models are more flexible but require more data to estimate the additional variance components reliably.
Why Bayesian mixed models converge more reliably
Classical mixed models estimated by maximum likelihood (REML in lme4) often fail to converge with:
- Complex random effects structures (many varying slopes)
- Small numbers of groups
- Binary or count outcomes combined with random effects
- Unbalanced designs
The problem is that the likelihood for variance components is often flat or has a boundary maximum at zero, causing numerical optimisation to fail or produce boundary estimates.
Bayesian mixed models via MCMC avoid this. The prior on variance components, typically a half-Student-t or half-normal, prevents estimates from collapsing to zero and regularises the posterior. The MCMC sampler does not optimise: it explores the posterior distribution, so it does not get stuck at boundaries.
This is a Bayesian logistic mixed model. glmer in lme4 can fit this but frequently fails to converge with complex random effects. brm handles it without modification.
Posterior summaries for mixed models
The posterior for a mixed model includes both the population-level (fixed) effects and the group-level (random) effects. fixef returns the population-level summaries. ranef returns the group-level deviations.
The posterior distribution over the random effects captures uncertainty about each group’s deviation from the population mean, which classical models treat as known point estimates once they converge.