Multilevel Models for Nested Data

Author

Mark Andrews

Abstract

When groups are themselves nested within higher-level groups, the multilevel model acquires a third level. This guide develops the three-level linear mixed effects model using a classroom mathematics dataset in which children are nested within classes which are nested within schools. It covers model specification, the interpretation of school-level and class-level variance components, model simplification, and the practical issue of relative versus unique group identifiers in lmer formula syntax.

library(tidyverse)
library(lme4)
library(immr)

The classroom data

The data records mathematics test scores for children in their first year of school, along with a measure of home socioeconomic status (ses). The data has a three-level nested structure: children (level 0) within classes (level 1) within schools (level 2).

classroom_df <- classroom
classroom_df

# A tibble: 1,190 × 5
   mathscore   ses classid schoolid classid2
       <dbl> <dbl> <fct>   <fct>    <fct>   
 1       480  0.46 160     1        1       
 2       569 -0.27 160     1        1       
 3       567 -0.03 160     1        1       
 4       532 -0.38 217     1        2       
 5       478 -0.03 217     1        2       
 6       515  0.76 217     1        2       
 7       503 -0.03 217     1        2       
 8       509  0.2  217     1        2       
 9       510  0.64 217     1        2       
10       473  0.13 217     1        2       
# ℹ 1,180 more rows

The variable schoolid is a unique identifier for each school and classid is a unique identifier for each class across the entire dataset. The variable classid2 is a class identifier relative to the school, so class 1 in school 41 and class 1 in school 97 are different classes.

Two-level model: variation across schools

Before considering the full three-level structure, fit a two-level model that allows the effect of ses to vary across schools.

M14 <- lmer(mathscore ~ ses + (ses | schoolid), data = classroom_df)
summary(M14)

Linear mixed model fit by REML ['lmerMod']
Formula: mathscore ~ ses + (ses | schoolid)
   Data: classroom_df

REML criterion at convergence: 11879.6

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-5.4151 -0.6052 -0.0472  0.5795  3.7352 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 schoolid (Intercept)  261.80  16.180        
          ses           54.98   7.415   0.47 
 Residual             1129.40  33.607        
Number of obs: 1190, groups:  schoolid, 107

Fixed effects:
            Estimate Std. Error t value
(Intercept)  522.946      1.917 272.779
ses           11.310      1.722   6.567

Correlation of Fixed Effects:
    (Intr)
ses 0.232

The fixed effects give the population-average relationship between ses and mathscore.

fixef(M14)

(Intercept)         ses 
  522.94641    11.31036

The random effects structure gives the variability in intercepts and slopes across schools.

VarCorr(M14)

 Groups   Name        Std.Dev. Corr  
 schoolid (Intercept) 16.1803        
          ses          7.4147  0.474 
 Residual             33.6065

The standard deviation of random intercepts tells us how much the average mathscore varies across schools. The standard deviation of random slopes tells us how much the effect of ses varies across schools. The correlation between them tells us whether schools with higher average scores also tend to show a stronger or weaker ses effect.

Three-level model: variation across schools and classes

The full three-level model adds a class-level random effects term, allowing both the intercept and the slope for ses to vary across classes as well as schools.

M15 <- lmer(
  mathscore ~ ses + (ses | schoolid) + (ses | classid),
  data = classroom_df
)
isSingular(M15)

[1] FALSE

VarCorr(M15)

 Groups   Name        Std.Dev. Corr  
 classid  (Intercept)  8.33529       
          ses          0.91453 1.000 
 schoolid (Intercept) 15.44468       
          ses          7.24255 0.443 
 Residual             32.90608

The fit is singular. The correlation between the class-level random intercept and slope is exactly 1, meaning the 2×2 covariance matrix for class-level effects is rank-deficient. The classroom data average fewer than four students per class, which is too few to identify a random intercept and a random slope independently at the class level: within-class variation in ses is so limited that the two dimensions of random variation collapse into one.

To test whether the singularity is driven by the correlation parameter or by the slope variance itself, remove the correlation between the two class-level random effects.

M16 <- lmer(
  mathscore ~ ses + (ses | schoolid) + (ses || classid),
  data = classroom_df
)

boundary (singular) fit: see help('isSingular')

isSingular(M16)

[1] TRUE

VarCorr(M16)

 Groups    Name        Std.Dev. Corr  
 classid   ses          0.0000        
 classid.1 (Intercept)  8.3964        
 schoolid  (Intercept) 15.4398        
           ses          7.3039  0.470 
 Residual              32.8950

The model is still singular. Forcing the class-level random effects to be independent (via ||) does not resolve the problem. The slope standard deviation at the class level collapses to exactly zero, confirming that the issue is not the correlation but the slope variance itself. The data contain no evidence for class-level variation in the ses slope.

The appropriate model restricts the class-level term to a random intercept only.

M17 <- lmer(
  mathscore ~ ses + (ses | schoolid) + (1 | classid),
  data = classroom_df
)

This model is not singular.

isSingular(M17)

[1] FALSE

fixef(M17)

(Intercept)         ses 
  522.82488    11.20439

VarCorr(M17)

 Groups   Name        Std.Dev. Corr  
 classid  (Intercept)  8.3945        
 schoolid (Intercept) 15.4393        
          ses          7.3042  0.470 
 Residual             32.8954

At the population level, a one-unit increase in ses is associated with an increase of 11.2 points in mathscore. The variance components show school-level variation in both intercepts and slopes, and class-level variation in intercepts only.

Model comparison confirms this structure.

anova(M14, M17, M16)

refitting model(s) with ML (instead of REML)

Data: classroom_df
Models:
M14: mathscore ~ ses + (ses | schoolid)
M17: mathscore ~ ses + (ses | schoolid) + (1 | classid)
M16: mathscore ~ ses + (ses | schoolid) + ((1 | classid) + (0 + ses | classid))
    npar   AIC   BIC  logLik -2*log(L)  Chisq Df Pr(>Chisq)  
M14    6 11898 11928 -5942.8     11886                       
M17    7 11895 11930 -5940.5     11881 4.6584  1     0.0309 *
M16    8 11897 11938 -5940.5     11881 0.0000  1     1.0000  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Adding a class-level random intercept (M17) significantly improves fit over the school-only model (M14). Adding a class-level random slope on top of that (M16) produces no further improvement: the likelihood ratio statistic is zero, consistent with the slope variance being estimated at the boundary.

Nested identifiers

The variable classid gives each class a unique identifier across the whole dataset. This is the straightforward case and the formula syntax above handles it directly.

In practice, grouped data often use relative identifiers. The classid2 variable labels classes within each school starting from 1, so there is a class 1, class 2, etc. in every school. To use this variable correctly, we must tell lmer that classid2 values are relative to schoolid.

M18 <- lmer(
  mathscore ~ ses + (1 | schoolid / classid2),
  data = classroom_df
)

The / notation creates a nested random effect: a random intercept for schools, and a random intercept for classes within schools. This is equivalent to:

M19 <- lmer(
  mathscore ~ ses + (1 | schoolid) + (1 | schoolid:classid2),
  data = classroom_df
)

The interaction schoolid:classid2 creates a unique identifier for each class by combining the school and within-school class labels. Both models are also equivalent to using the globally unique classid variable directly.

M20 <- lmer(
  mathscore ~ ses + (1 | schoolid) + (1 | classid),
  data = classroom_df
)

All three models produce the same estimates.

fixef(M18)

(Intercept)         ses 
  523.16910    11.81137

fixef(M19)

(Intercept)         ses 
  523.16910    11.81137

fixef(M20)

(Intercept)         ses 
  523.16910    11.81137

Variance components as a model-building guide

In nested models, the variance components at each level provide guidance for model simplification. Levels with near-zero variance explain little and can often be dropped. Levels with large variance relative to the residual are important to retain.

Examine the variance components of M17, the model selected by the singularity analysis above.

VarCorr(M17)

 Groups   Name        Std.Dev. Corr  
 classid  (Intercept)  8.3945        
 schoolid (Intercept) 15.4393        
          ses          7.3042  0.470 
 Residual             32.8954

The school-level variance components show that schools differ both in their average mathematics scores and in the strength of the ses effect. The class-level variance shows that classes differ in their average scores beyond the school-level variation. There is no class-level slope component because the data provided no evidence for variation in the ses effect across classes.