Linear Mixed Effects Models

Mark Andrews

Varying slopes and intercepts

The pvtrt dataset: 18 participants, 10 days of sleep deprivation, mean reaction time each day.

ggplot(pvtrt, aes(x = day, y = rt)) +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE) +
  facet_wrap(~id) +
  labs(x = "Day", y = "RT (ms)")

Most participants slow down, but the rate varies considerably across participants.

The non-multilevel flat model

One intercept and one slope per participant, estimated independently.

This is 18 independent regressions sharing only \(\sigma^2\).

The multilevel model

The participant-specific coefficients \(\boldsymbol{\beta}_j = (\beta_{j0}, \beta_{j1})^\top\) are drawn from a bivariate normal:

\[ \begin{aligned} y_i &\sim N(\mu_i, \sigma^2) \\ \mu_i &= \beta_{[s_i]0} + \beta_{[s_i]1} x_i \\ \boldsymbol{\beta}_j &\sim N(\boldsymbol{b}, \Sigma) \end{aligned} \]

Fixed and random effects decomposition

Writing \(\boldsymbol{\beta}_j = \boldsymbol{b} + \boldsymbol{\zeta}_j\):

\[ \mu_i = \underbrace{b_0 + b_1 x_i}_{\text{fixed effects}} + \underbrace{\zeta_{[s_i]0} + \zeta_{[s_i]1} x_i}_{\text{random effects}} \]

Fixed effects: population-average intercept and slope.

Random effects: participant-specific deviations from those population values.

Model structure

Left: explicit Bayesian network for a single observation. Right: same model in plate notation, compressing the repetition over days.

Fitting the full model

M7 <- lmer(rt ~ day + (day | id), data = pvtrt)
fixef(M7)
VarCorr(M7)

\(\tau_0\), \(\tau_1\): SDs of random intercepts and slopes. \(\rho\): correlation between them.

Four model variants

Formula Random effects
(day | id) Intercept + slope, correlated
(1 | id) Intercept only
(0 + day | id) Slope only
(day || id) Intercept + slope, independent

The choice among these is guided by likelihood ratio tests.

Model comparison

To compare models differing in their random effects structure, use REML (the default).

M8 <- lmer(rt ~ day + (1 | id), data = pvtrt)
M10a <- lmer(rt ~ day + (1 | id) + (0 + day | id), data = pvtrt)
anova(M8, M10a, M7)

Inference: confidence intervals

Profile-likelihood confidence intervals for all parameters.

confint(M7)

.sig01, .sig02, .sig03 are elements of the Cholesky factor of \(\Sigma\). .sigma is \(\hat\sigma\).

Inference: p-values via lmerTest

lme4 does not provide p-values because the degrees of freedom are not well defined for mixed models. lmerTest uses the Satterthwaite approximation.

M7_test <- lmerTest::lmer(rt ~ day + (day | id), data = pvtrt)
summary(M7_test)$coefficients

Confidence intervals from confint are the preferred summary.