library(tidyverse)
library(lme4)
library(immr)Multilevel Models for Crossed Data
In nested designs, the lower-level units belong exclusively to one higher-level unit. Crossed designs are different: each observation is simultaneously classified by two or more grouping factors that are not nested. The canonical example is a psycholinguistic experiment in which every participant responds to every stimulus word. This guide develops the crossed random effects model using British Lexicon Project data, explains the conceptual difference from nesting, and shows how multiple random effects terms are specified in lmer.
Nested versus crossed structures
In the classroom data, each child belongs to exactly one class, and each class belongs to exactly one school. The groups are nested: class membership implies school membership.
In a lexical decision experiment, each participant responds to multiple words and each word is seen by multiple participants. No participant “belongs” to any particular word and no word “belongs” to any particular participant. The two grouping factors — participants and words — are crossed.
The distinction matters for modelling because it determines how the random effects are structured.
The British Lexicon Project data
The dataset is a subset of the British Lexicon Project, a large-scale lexical decision study. Each row records one response from one participant to one word. The response time (rt) is the variable of interest, and freq is the log-frequency of the word in a reference corpus.
blp_df <- blp_short2
blp_df# A tibble: 662 × 6
item participant rt old20 nletters freq
<fct> <fct> <dbl> <dbl> <dbl> <dbl>
1 encode 4 470 2.2 6 93
2 encode 46 599 2.2 6 93
3 encode 78 403 2.2 6 93
4 encode 72 975 2.2 6 93
5 encode 32 504 2.2 6 93
6 encode 52 588 2.2 6 93
7 encode 68 418 2.2 6 93
8 encode 10 718 2.2 6 93
9 encode 54 742 2.2 6 93
10 encode 6 776 2.2 6 93
# ℹ 652 more rows
There are 78 participants and 58 distinct words in this subset.
Standardise frequency before modelling to put it on a comparable scale.
blp_df <- mutate(blp_df, freq2 = scale(freq)[, 1])A model with participant random effects only
Begin with a model that allows the effect of frequency on reaction time to vary across participants.
M24 <- lmer(rt ~ freq2 + (freq2 | participant), data = blp_df)
summary(M24)Linear mixed model fit by REML ['lmerMod']
Formula: rt ~ freq2 + (freq2 | participant)
Data: blp_df
REML criterion at convergence: 8445.3
Scaled residuals:
Min 1Q Median 3Q Max
-1.6956 -0.5551 -0.2147 0.2951 6.5511
Random effects:
Groups Name Variance Std.Dev. Corr
participant (Intercept) 5753.2 75.85
freq2 250.8 15.84 -0.94
Residual 22494.4 149.98
Number of obs: 651, groups: participant, 78
Fixed effects:
Estimate Std. Error t value
(Intercept) 584.387 10.649 54.875
freq2 -10.911 6.262 -1.742
Correlation of Fixed Effects:
(Intr)
freq2 -0.232
The fixed effects give the population-average relationship between word frequency and reaction time.
fixef(M24)(Intercept) freq2
584.38724 -10.91079
High-frequency words tend to be recognised more quickly. The random effects describe how much this effect varies across individuals.
VarCorr(M24) Groups Name Std.Dev. Corr
participant (Intercept) 75.850
freq2 15.838 -0.936
Residual 149.981
Adding word-level random effects
Beyond individual differences in participants, some words may consistently produce faster or slower responses regardless of who is responding to them. This is word-specific variability: two participants may both be slower on a particular word, not because of anything about either participant, but because of something about the word itself.
To account for this, add a random intercept for words.
M25 <- lmer(rt ~ freq2 + (freq2 | participant) + (1 | item), data = blp_df)
summary(M25)Linear mixed model fit by REML ['lmerMod']
Formula: rt ~ freq2 + (freq2 | participant) + (1 | item)
Data: blp_df
REML criterion at convergence: 8397.3
Scaled residuals:
Min 1Q Median 3Q Max
-1.7224 -0.5275 -0.1803 0.3153 6.3813
Random effects:
Groups Name Variance Std.Dev. Corr
participant (Intercept) 6272.6 79.20
freq2 292.4 17.10 -0.90
item (Intercept) 4106.9 64.08
Residual 18384.1 135.59
Number of obs: 651, groups: participant, 78; item, 58
Fixed effects:
Estimate Std. Error t value
(Intercept) 586.12 13.67 42.873
freq2 -11.65 10.39 -1.122
Correlation of Fixed Effects:
(Intr)
freq2 -0.119
The formula now has two random effects terms: one for participants and one for words. These are crossed because every participant sees every word (or could in principle).
VarCorr(M25) Groups Name Std.Dev. Corr
participant (Intercept) 79.199
freq2 17.099 -0.904
item (Intercept) 64.085
Residual 135.588
The word-level standard deviation tells us how much systematic variability in response time exists across words, above and beyond what freq2 already explains.
Comparing models
Does adding word-level random effects improve the fit?
anova(M24, M25)refitting model(s) with ML (instead of REML)
Data: blp_df
Models:
M24: rt ~ freq2 + (freq2 | participant)
M25: rt ~ freq2 + (freq2 | participant) + (1 | item)
npar AIC BIC logLik -2*log(L) Chisq Df Pr(>Chisq)
M24 6 8469.3 8496.2 -4228.7 8457.3
M25 7 8424.8 8456.2 -4205.4 8410.8 46.517 1 9.084e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A significant improvement indicates that words vary in their reaction times beyond what can be explained by their frequency alone.
Why not treat words as fixed effects?
An alternative to a word-level random effect would be a fixed effect for each word: include item as a factor in the model.
M26 <- lmer(rt ~ freq2 + item + (freq2 | participant), data = blp_df)fixed-effect model matrix is rank deficient so dropping 1 column / coefficient
This model estimates a separate intercept for each word. The random effect approach instead treats the words as a sample from the population of English words and models the distribution of word-specific effects. This is conceptually the same choice as between fixed and random effects for participants: the random effect is appropriate when the groups are a sample from a larger population and we want to generalise to that population.
For psycholinguistic studies, the random effects approach for both participants and words is standard. The influential critique by Clark (1973) showed that treating items as fixed effects leads to anticonservative inference when the research question is about the population of words.
The model as a crossed random effects design
The structure of M25 can be written explicitly:
\[ \begin{aligned} y_i &\sim N(\mu_i, \sigma^2), \\ \mu_i &= \beta_{[s_i]} + \gamma_{[w_i]}, \\ \beta_j &\sim N(b_s, \tau_s^2), \quad j = 1, \ldots, J, \\ \gamma_k &\sim N(b_w, \tau_w^2), \quad k = 1, \ldots, K, \end{aligned} \]
where \(s_i\) indexes the participant for observation \(i\) and \(w_i\) indexes the word. The participant effects \(\beta_j\) and word effects \(\gamma_k\) are independent of each other.
In practice this means lmer allows the two random effects terms to vary independently, and the (freq2 | participant) and (1 | item) terms are not linked. This is the defining feature of crossed random effects: independence between the two grouping factors.
In a nested model, by contrast, the class-level random effects are constrained to be nested within school-level random effects. The / syntax in (1 | schoolid / classid2) encodes that dependence explicitly.