library(tidyverse)
library(lme4)
library(immr)Multilevel Models for Nested Data
When groups are themselves nested within higher-level groups, the multilevel model acquires a third level. This guide develops the three-level linear mixed effects model using a classroom mathematics dataset in which children are nested within classes which are nested within schools. It covers model specification, the interpretation of school-level and class-level variance components, model simplification, and the practical issue of relative versus unique group identifiers in lmer formula syntax.
The classroom data
The data records mathematics test scores for children in their first year of school, along with a measure of home socioeconomic status (ses). The data has a three-level nested structure: children (level 0) within classes (level 1) within schools (level 2).
classroom_df <- classroom
classroom_df# A tibble: 1,190 × 5
mathscore ses classid schoolid classid2
<dbl> <dbl> <fct> <fct> <fct>
1 480 0.46 160 1 1
2 569 -0.27 160 1 1
3 567 -0.03 160 1 1
4 532 -0.38 217 1 2
5 478 -0.03 217 1 2
6 515 0.76 217 1 2
7 503 -0.03 217 1 2
8 509 0.2 217 1 2
9 510 0.64 217 1 2
10 473 0.13 217 1 2
# ℹ 1,180 more rows
The variable schoolid is a unique identifier for each school and classid is a unique identifier for each class across the entire dataset. The variable classid2 is a class identifier relative to the school, so class 1 in school 41 and class 1 in school 97 are different classes.
Two-level model: variation across schools
Before considering the full three-level structure, fit a two-level model that allows the effect of ses to vary across schools.
M14 <- lmer(mathscore ~ ses + (ses | schoolid), data = classroom_df)
summary(M14)Linear mixed model fit by REML ['lmerMod']
Formula: mathscore ~ ses + (ses | schoolid)
Data: classroom_df
REML criterion at convergence: 11879.6
Scaled residuals:
Min 1Q Median 3Q Max
-5.4151 -0.6052 -0.0472 0.5795 3.7352
Random effects:
Groups Name Variance Std.Dev. Corr
schoolid (Intercept) 261.80 16.180
ses 54.98 7.415 0.47
Residual 1129.40 33.607
Number of obs: 1190, groups: schoolid, 107
Fixed effects:
Estimate Std. Error t value
(Intercept) 522.946 1.917 272.779
ses 11.310 1.722 6.567
Correlation of Fixed Effects:
(Intr)
ses 0.232
The fixed effects give the population-average relationship between ses and mathscore.
fixef(M14)(Intercept) ses
522.94641 11.31036
The random effects structure gives the variability in intercepts and slopes across schools.
VarCorr(M14) Groups Name Std.Dev. Corr
schoolid (Intercept) 16.1803
ses 7.4147 0.474
Residual 33.6065
The standard deviation of random intercepts tells us how much the average mathscore varies across schools. The standard deviation of random slopes tells us how much the effect of ses varies across schools. The correlation between them tells us whether schools with higher average scores also tend to show a stronger or weaker ses effect.
Three-level model: variation across schools and classes
The full three-level model adds a class-level random effects term, allowing both the intercept and the slope for ses to vary across classes as well as schools.
M15 <- lmer(
mathscore ~ ses + (ses | schoolid) + (ses | classid),
data = classroom_df
)
isSingular(M15)[1] FALSE
VarCorr(M15) Groups Name Std.Dev. Corr
classid (Intercept) 8.33529
ses 0.91453 1.000
schoolid (Intercept) 15.44468
ses 7.24255 0.443
Residual 32.90608
The fit is singular. The correlation between the class-level random intercept and slope is exactly 1, meaning the 2×2 covariance matrix for class-level effects is rank-deficient. The classroom data average fewer than four students per class, which is too few to identify a random intercept and a random slope independently at the class level: within-class variation in ses is so limited that the two dimensions of random variation collapse into one.
To test whether the singularity is driven by the correlation parameter or by the slope variance itself, remove the correlation between the two class-level random effects.
M16 <- lmer(
mathscore ~ ses + (ses | schoolid) + (ses || classid),
data = classroom_df
)boundary (singular) fit: see help('isSingular')
isSingular(M16)[1] TRUE
VarCorr(M16) Groups Name Std.Dev. Corr
classid ses 0.0000
classid.1 (Intercept) 8.3964
schoolid (Intercept) 15.4398
ses 7.3039 0.470
Residual 32.8950
The model is still singular. Forcing the class-level random effects to be independent (via ||) does not resolve the problem. The slope standard deviation at the class level collapses to exactly zero, confirming that the issue is not the correlation but the slope variance itself. The data contain no evidence for class-level variation in the ses slope.
The appropriate model restricts the class-level term to a random intercept only.
M17 <- lmer(
mathscore ~ ses + (ses | schoolid) + (1 | classid),
data = classroom_df
)This model is not singular.
isSingular(M17)[1] FALSE
fixef(M17)(Intercept) ses
522.82488 11.20439
VarCorr(M17) Groups Name Std.Dev. Corr
classid (Intercept) 8.3945
schoolid (Intercept) 15.4393
ses 7.3042 0.470
Residual 32.8954
At the population level, a one-unit increase in ses is associated with an increase of 11.2 points in mathscore. The variance components show school-level variation in both intercepts and slopes, and class-level variation in intercepts only.
Model comparison confirms this structure.
anova(M14, M17, M16)refitting model(s) with ML (instead of REML)
Data: classroom_df
Models:
M14: mathscore ~ ses + (ses | schoolid)
M17: mathscore ~ ses + (ses | schoolid) + (1 | classid)
M16: mathscore ~ ses + (ses | schoolid) + ((1 | classid) + (0 + ses | classid))
npar AIC BIC logLik -2*log(L) Chisq Df Pr(>Chisq)
M14 6 11898 11928 -5942.8 11886
M17 7 11895 11930 -5940.5 11881 4.6584 1 0.0309 *
M16 8 11897 11938 -5940.5 11881 0.0000 1 1.0000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Adding a class-level random intercept (M17) significantly improves fit over the school-only model (M14). Adding a class-level random slope on top of that (M16) produces no further improvement: the likelihood ratio statistic is zero, consistent with the slope variance being estimated at the boundary.
Nested identifiers
The variable classid gives each class a unique identifier across the whole dataset. This is the straightforward case and the formula syntax above handles it directly.
In practice, grouped data often use relative identifiers. The classid2 variable labels classes within each school starting from 1, so there is a class 1, class 2, etc. in every school. To use this variable correctly, we must tell lmer that classid2 values are relative to schoolid.
M18 <- lmer(
mathscore ~ ses + (1 | schoolid / classid2),
data = classroom_df
)The / notation creates a nested random effect: a random intercept for schools, and a random intercept for classes within schools. This is equivalent to:
M19 <- lmer(
mathscore ~ ses + (1 | schoolid) + (1 | schoolid:classid2),
data = classroom_df
)The interaction schoolid:classid2 creates a unique identifier for each class by combining the school and within-school class labels. Both models are also equivalent to using the globally unique classid variable directly.
M20 <- lmer(
mathscore ~ ses + (1 | schoolid) + (1 | classid),
data = classroom_df
)All three models produce the same estimates.
fixef(M18)(Intercept) ses
523.16910 11.81137
fixef(M19)(Intercept) ses
523.16910 11.81137
fixef(M20)(Intercept) ses
523.16910 11.81137
Variance components as a model-building guide
In nested models, the variance components at each level provide guidance for model simplification. Levels with near-zero variance explain little and can often be dropped. Levels with large variance relative to the residual are important to retain.
Examine the variance components of M17, the model selected by the singularity analysis above.
VarCorr(M17) Groups Name Std.Dev. Corr
classid (Intercept) 8.3945
schoolid (Intercept) 15.4393
ses 7.3042 0.470
Residual 32.8954
The school-level variance components show that schools differ both in their average mathematics scores and in the strength of the ses effect. The class-level variance shows that classes differ in their average scores beyond the school-level variation. There is no class-level slope component because the data provided no evidence for variation in the ses effect across classes.