Random Effects Models

Mark Andrews

Grouped binary data

The rats tumour dataset has 71 batches of lab rats. For each batch \(j\) we observe the number of tumours \(m_j\) out of \(n_j\) rats.

\[ m_j \sim \mathrm{Binom}(\theta_j, n_j) \]

The question is how to estimate the tumour probability \(\theta_j\) for each batch.

Three estimation strategies

Complete pooling. Treat all batches as one, estimate a single \(\theta\).

No pooling. Estimate each \(\theta_j\) independently from batch \(j\)’s data only.

Partial pooling. A middle path: let the batches inform each other via a shared population model.

The no-pooling model

Each batch has its own \(\beta_j = \log(\theta_j / (1-\theta_j))\), estimated independently.

Batches with small \(n_j\) or extreme counts get wide, sometimes degenerate, confidence intervals.

The multilevel model

The \(\beta_j\) are not independent: they are drawn from a common normal distribution.

\[ \beta_j \sim N(b, \tau^2) \]

The multilevel model: expanded

Writing \(\beta_j = b + \xi_j\) with \(\xi_j \sim N(0, \tau^2)\):

\[ m_j \sim \mathrm{Binom}(\theta_j, n_j), \qquad \log\!\frac{\theta_j}{1-\theta_j} = b + \xi_j \]

The population distribution \(N(b, \tau^2)\) is a model of the models: it describes the distribution from which the per-batch log-odds are drawn.

Fitting in R

rats_df <- rats
M3 <- binomial_model(m, n, group = batch, data = rats_df) # no pooling
M4 <- binomial_model(m, n, group = batch, data = rats_df, multilevel = TRUE) # multilevel

M4 estimates \(b\) (population mean log-odds), \(\tau\) (between-batch SD), and per-batch estimates.

Partial pooling and shrinkage

The multilevel estimate for batch \(j\) is a weighted average of:

the batch’s own data, and
the grand mean \(b\)

with weights determined by \(\tau\) and \(n_j\). Batches with small \(n_j\) are pulled more strongly toward the grand mean. This is shrinkage.

Shrinkage plot

The population distribution

The model implies a distribution over the tumour probability for all batches, including unobserved future ones:

\[ \theta \sim \mathrm{logit}^{-1}\!N(b, \tau^2) \]

The prediction interval for a new batch follows directly from \(b\) and \(\tau\).

Why partial pooling is preferable

The no-pooling model discards the information that all batches come from the same experiment. Batches with a single rat get degenerate intervals.

The multilevel model borrows strength: even a batch with one rat gets a reasonable estimate, because the population model regularises it.

The estimated \(\tau\) determines automatically how much pooling is appropriate. If \(\tau \approx 0\), batches are nearly identical and full pooling is appropriate. If \(\tau\) is large, batches differ and the estimates remain spread out.