Multilevel Models for Crossed Data

Author

Mark Andrews

Abstract

In nested designs, the lower-level units belong exclusively to one higher-level unit. Crossed designs are different: each observation is simultaneously classified by two or more grouping factors that are not nested. The canonical example is a psycholinguistic experiment in which every participant responds to every stimulus word. This guide develops the crossed random effects model using British Lexicon Project data, explains the conceptual difference from nesting, and shows how multiple random effects terms are specified in lmer.

library(tidyverse)
library(lme4)
library(immr)

Nested versus crossed structures

In the classroom data, each child belongs to exactly one class, and each class belongs to exactly one school. The groups are nested: class membership implies school membership.

In a lexical decision experiment, each participant responds to multiple words and each word is seen by multiple participants. No participant “belongs” to any particular word and no word “belongs” to any particular participant. The two grouping factors — participants and words — are crossed.

The distinction matters for modelling because it determines how the random effects are structured.

The British Lexicon Project data

The dataset is a subset of the British Lexicon Project, a large-scale lexical decision study. Each row records one response from one participant to one word. The response time (rt) is the variable of interest, and freq is the log-frequency of the word in a reference corpus.

blp_df <- blp_short2
blp_df
# A tibble: 662 × 6
   item   participant    rt old20 nletters  freq
   <fct>  <fct>       <dbl> <dbl>    <dbl> <dbl>
 1 encode 4             470   2.2        6    93
 2 encode 46            599   2.2        6    93
 3 encode 78            403   2.2        6    93
 4 encode 72            975   2.2        6    93
 5 encode 32            504   2.2        6    93
 6 encode 52            588   2.2        6    93
 7 encode 68            418   2.2        6    93
 8 encode 10            718   2.2        6    93
 9 encode 54            742   2.2        6    93
10 encode 6             776   2.2        6    93
# ℹ 652 more rows

There are 78 participants and 58 distinct words in this subset.

Standardise frequency before modelling to put it on a comparable scale.

blp_df <- mutate(blp_df, freq2 = scale(freq)[, 1])

A model with participant random effects only

Begin with a model that allows the effect of frequency on reaction time to vary across participants.

M24 <- lmer(rt ~ freq2 + (freq2 | participant), data = blp_df)
summary(M24)
Linear mixed model fit by REML ['lmerMod']
Formula: rt ~ freq2 + (freq2 | participant)
   Data: blp_df

REML criterion at convergence: 8445.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.6956 -0.5551 -0.2147  0.2951  6.5511 

Random effects:
 Groups      Name        Variance Std.Dev. Corr  
 participant (Intercept)  5753.2   75.85         
             freq2         250.8   15.84   -0.94 
 Residual                22494.4  149.98         
Number of obs: 651, groups:  participant, 78

Fixed effects:
            Estimate Std. Error t value
(Intercept)  584.387     10.649  54.875
freq2        -10.911      6.262  -1.742

Correlation of Fixed Effects:
      (Intr)
freq2 -0.232

The fixed effects give the population-average relationship between word frequency and reaction time.

fixef(M24)
(Intercept)       freq2 
  584.38724   -10.91079 

High-frequency words tend to be recognised more quickly. The random effects describe how much this effect varies across individuals.

VarCorr(M24)
 Groups      Name        Std.Dev. Corr   
 participant (Intercept)  75.850         
             freq2        15.838  -0.936 
 Residual                149.981         

Adding word-level random effects

Beyond individual differences in participants, some words may consistently produce faster or slower responses regardless of who is responding to them. This is word-specific variability: two participants may both be slower on a particular word, not because of anything about either participant, but because of something about the word itself.

To account for this, add a random intercept for words.

M25 <- lmer(rt ~ freq2 + (freq2 | participant) + (1 | item), data = blp_df)
summary(M25)
Linear mixed model fit by REML ['lmerMod']
Formula: rt ~ freq2 + (freq2 | participant) + (1 | item)
   Data: blp_df

REML criterion at convergence: 8397.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.7224 -0.5275 -0.1803  0.3153  6.3813 

Random effects:
 Groups      Name        Variance Std.Dev. Corr  
 participant (Intercept)  6272.6   79.20         
             freq2         292.4   17.10   -0.90 
 item        (Intercept)  4106.9   64.08         
 Residual                18384.1  135.59         
Number of obs: 651, groups:  participant, 78; item, 58

Fixed effects:
            Estimate Std. Error t value
(Intercept)   586.12      13.67  42.873
freq2         -11.65      10.39  -1.122

Correlation of Fixed Effects:
      (Intr)
freq2 -0.119

The formula now has two random effects terms: one for participants and one for words. These are crossed because every participant sees every word (or could in principle).

VarCorr(M25)
 Groups      Name        Std.Dev. Corr   
 participant (Intercept)  79.199         
             freq2        17.099  -0.904 
 item        (Intercept)  64.085         
 Residual                135.588         

The word-level standard deviation tells us how much systematic variability in response time exists across words, above and beyond what freq2 already explains.

Comparing models

Does adding word-level random effects improve the fit?

anova(M24, M25)
refitting model(s) with ML (instead of REML)
Data: blp_df
Models:
M24: rt ~ freq2 + (freq2 | participant)
M25: rt ~ freq2 + (freq2 | participant) + (1 | item)
    npar    AIC    BIC  logLik -2*log(L)  Chisq Df Pr(>Chisq)    
M24    6 8469.3 8496.2 -4228.7    8457.3                         
M25    7 8424.8 8456.2 -4205.4    8410.8 46.517  1  9.084e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

A significant improvement indicates that words vary in their reaction times beyond what can be explained by their frequency alone.

Why not treat words as fixed effects?

An alternative to a word-level random effect would be a fixed effect for each word: include item as a factor in the model.

M26 <- lmer(rt ~ freq2 + item + (freq2 | participant), data = blp_df)
fixed-effect model matrix is rank deficient so dropping 1 column / coefficient

This model estimates a separate intercept for each word. The random effect approach instead treats the words as a sample from the population of English words and models the distribution of word-specific effects. This is conceptually the same choice as between fixed and random effects for participants: the random effect is appropriate when the groups are a sample from a larger population and we want to generalise to that population.

For psycholinguistic studies, the random effects approach for both participants and words is standard. The influential critique by Clark (1973) showed that treating items as fixed effects leads to anticonservative inference when the research question is about the population of words.

The model as a crossed random effects design

The structure of M25 can be written explicitly:

\[ \begin{aligned} y_i &\sim N(\mu_i, \sigma^2), \\ \mu_i &= \beta_{[s_i]} + \gamma_{[w_i]}, \\ \beta_j &\sim N(b_s, \tau_s^2), \quad j = 1, \ldots, J, \\ \gamma_k &\sim N(b_w, \tau_w^2), \quad k = 1, \ldots, K, \end{aligned} \]

where \(s_i\) indexes the participant for observation \(i\) and \(w_i\) indexes the word. The participant effects \(\beta_j\) and word effects \(\gamma_k\) are independent of each other.

In practice this means lmer allows the two random effects terms to vary independently, and the (freq2 | participant) and (1 | item) terms are not linked. This is the defining feature of crossed random effects: independence between the two grouping factors.

In a nested model, by contrast, the class-level random effects are constrained to be nested within school-level random effects. The / syntax in (1 | schoolid / classid2) encodes that dependence explicitly.