Priors and Model Comparison

Author

Mark Andrews

Abstract

A close look at prior specification in brms and Bayesian model comparison. We cover default and custom priors, prior sensitivity analysis, and model comparison via leave-one-out cross-validation, WAIC, and Bayes factors.

Default priors in brms

When you fit a model with brm without specifying priors, brms uses weakly informative defaults. These are designed to regularise the posterior without strongly influencing the result when the likelihood is informative.

The default prior on regression coefficients is typically a Student-t distribution with a few degrees of freedom, centred at zero, with a scale determined by the data range. The default on the intercept is similarly weakly informative. On sigma the default is a half-Student-t, which is proper (integrates to one) and keeps sigma positive.

Setting custom priors

Custom priors are set using set_prior. Each call specifies the distribution, the class of parameter, and optionally the specific coefficient.

A more detailed specification sets different priors for different parameters.

Prior sensitivity

A good analysis checks that conclusions are not highly sensitive to the choice of prior. The practical approach is to fit the same model with a range of priors and compare the posteriors.

If the posteriors are similar across different reasonable priors, the inference is robust. If they differ substantially, more thought is needed about what prior information is actually available.

The t distribution as a prior on sigma

The Student-t distribution with one degree of freedom is the Cauchy distribution: extremely heavy-tailed, putting substantial prior mass on very large values. It is a common weakly informative prior for scale parameters.

Model comparison

Leave-one-out cross-validation

LOO cross-validation estimates how well a model would predict new data. For each observation in turn, we fit the model without that observation and ask how well the fitted model predicts it. The expected log predictive density (ELPD) is the sum of these log predictive probabilities:

\[ \text{elpd} = \sum_{i=1}^{N} \log p(x_i \mid x_{-i}, m) \]

The LOO information criterion is \(\text{LOOIC} = -2 \cdot \text{elpd}\), on the same scale as AIC. brms computes this efficiently using Pareto-smoothed importance sampling rather than refitting the model \(N\) times.

Bayes factors

The Bayes factor is the ratio of the marginal likelihoods of two models:

\[ \text{BF} = \frac{p(D \mid M_0)}{p(D \mid M_1)} \]

A Bayes factor greater than one favours \(M_0\). On the log scale, a value of 108 corresponds to \(e^{108}\), which is overwhelming evidence. Bayes factors are naturally expressed on the log scale to remain numerically tractable.

Bayes factors require the marginal likelihood, which is sensitive to the prior. This is both a strength (they naturally penalise overly complex models) and a limitation (they depend on the prior in a way that ELPD-based criteria do not). For comparing predictive performance, LOO is generally more robust. For formal hypothesis testing between nested models, Bayes factors are the appropriate tool.