library(tidyverse)
library(lme4)
library(simr)Power Analysis and Sample Size Determination
Determining an adequate sample size for a multilevel study is more complex than in single-level designs because power depends on the number of groups, the number of observations per group, the variance components, and the effect size of interest. Closed-form analytical solutions are not available in general. The simulation-based approach implemented in the simr package constructs a hypothetical mixed effects model with specified parameters and estimates power by repeatedly simulating data from it and counting how often the key effect is detected at the chosen significance level.
Why simulation is needed
In a simple two-sample \(t\)-test, power depends on the effect size (the standardised mean difference), the sample size, and the significance level. A closed-form relationship between these quantities exists and can be solved analytically.
In a multilevel design, power depends on more quantities: the fixed effect of interest, the variance components at each level, the number of groups, the number of observations per group, and the correlation structure of the random effects. There is no single formula that captures all of this, so simulation is the standard approach.
The strategy is to construct a mixed effects model that reflects the design you are planning, specify the effect sizes and variance components you expect or consider plausible, and then simulate many datasets from this model. For each simulated dataset, fit the model and test the effect of interest. The proportion of simulations in which the test is significant at the chosen level is the estimated power.
The simr package
The simr package provides tools for this workflow. The key functions are:
makeLmer()— constructs a hypotheticallmermodel with specified parameters.powerSim()— estimates power by simulation.extend()— modifies the sample size of the fake data to explore the power curve.
Specifying the hypothetical model
We will use the sleep deprivation study as a template. Suppose we are planning a new study with the same design — 10 days of measurement per subject — and we want to estimate the power to detect a slope of 10 ms per day, given realistic variance components.
First, construct a dataset with the planned design: 10 subjects, 10 days each.
fake_sleep_data <- expand_grid(subject = 1:10, days = 0:9)
fake_sleep_data# A tibble: 100 × 2
subject days
<int> <int>
1 1 0
2 1 1
3 1 2
4 1 3
5 1 4
6 1 5
7 1 6
8 1 7
9 1 8
10 1 9
# ℹ 90 more rows
Now specify the parameter values for the hypothetical model. These should come from a combination of prior literature and the planned effect size.
fixed_b <- c(250, 10) # intercept = 250 ms, slope = 10 ms/day
V <- matrix(
c(625, 0,
0, 25),
nrow = 2
) # tau_0 = 25, tau_1 = 5, rho = 0
s <- 25 # residual SDThe intercept of 250 ms and residual SD of 25 ms are plausible for a reaction time study. The slope of 10 ms per day is the effect we want to detect. The variance components specify how much subjects differ in their baselines (\(\tau_0 = 25\)) and how much they differ in their rate of slowing (\(\tau_1 = 5\)).
Construct the model object.
fake_model <- makeLmer(
y ~ days + (days | subject),
data = fake_sleep_data,
fixef = fixed_b,
VarCorr = V,
sigma = s
)
summary(fake_model)Linear mixed model fit by REML ['lmerMod']
Formula: y ~ days + (days | subject)
Data: fake_sleep_data
REML criterion at convergence: 944.4
Scaled residuals:
Min 1Q Median 3Q Max
-3.09040 -0.41630 -0.06215 0.53974 2.21466
Random effects:
Groups Name Variance Std.Dev. Corr
subject (Intercept) 625 25
days 25 5 0.00
Residual 625 25
Number of obs: 100, groups: subject, 10
Fixed effects:
Estimate Std. Error t value
(Intercept) 250.000 9.170 27.262
days 10.000 1.805 5.541
Correlation of Fixed Effects:
(Intr)
days -0.206
Estimating power
With the model constructed, powerSim simulates data from it repeatedly and fits the model to each simulated dataset. The default test is a likelihood ratio test for all fixed effects; test = fixed("days") restricts it to the days slope.
The nsim argument controls how many simulations to run. More simulations give a more precise power estimate but take longer. For a quick check use 100; for a final report use at least 1000.
powerSim(fake_model, test = fixed("days"), nsim = 1000)The output reports the estimated power with a 95% confidence interval. Power depends directly on the effect size (the slope of 10 ms/day relative to the noise). If the true slope were smaller, power would be lower.
Varying sample size
To find the sample size needed for a target power (typically 80% or 90%), repeat the simulation for different numbers of subjects. The extend function scales up the sample.
fake_model_20 <- extend(fake_model, along = "subject", n = 20)
powerSim(fake_model_20, test = fixed("days"), nsim = 1000)
fake_model_30 <- extend(fake_model, along = "subject", n = 30)
powerSim(fake_model_30, test = fixed("days"), nsim = 1000)Running powerSim across a range of sample sizes and plotting the results gives a power curve, which shows how power increases with sample size.
A design with two conditions
Many experiments compare two conditions across subjects and time. The model extends naturally.
fake_sleep_data2 <- expand_grid(
subject = 1:10,
days = 0:9,
x = c("A", "B")
)
fixed_b2 <- c(250, 10, 5, 2) # intercept, days, xB, days:xB
fake_model2 <- makeLmer(
y ~ x * days + (days | subject),
data = fake_sleep_data2,
fixef = fixed_b2,
VarCorr = V,
sigma = s
)
summary(fake_model2)Linear mixed model fit by REML ['lmerMod']
Formula: y ~ x * days + (days | subject)
Data: fake_sleep_data2
REML criterion at convergence: 1879.2
Scaled residuals:
Min 1Q Median 3Q Max
-2.14822 -0.60620 -0.07965 0.69322 2.51141
Random effects:
Groups Name Variance Std.Dev. Corr
subject (Intercept) 625 25
days 25 5 0.00
Residual 625 25
Number of obs: 200, groups: subject, 10
Fixed effects:
Estimate Std. Error t value
(Intercept) 250.000 9.170 27.262
xB 10.000 6.571 1.522
days 5.000 1.805 2.770
xB:days 2.000 1.231 1.625
Correlation of Fixed Effects:
(Intr) xB days
xB -0.358
days -0.206 0.287
xB:days 0.302 -0.843 -0.341
The effect of primary interest here is the interaction xB:days — whether the rate of slowing differs between conditions. Its coefficient is 2 ms/day in this specification.
powerSim(fake_model2, test = fixed("xB:days"), nsim = 1000)Sensitivity to variance components
Power in multilevel designs is sensitive to the variance components as well as to the fixed effects. If between-subject variability in slopes (\(\tau_1\)) is large, the fixed slope is harder to detect because subjects vary so much that the average effect is obscured.
Explore this by varying \(\tau_1\) in the model specification and re-running powerSim. This sensitivity analysis reveals which assumptions are driving the power estimate and helps identify the conditions under which the study is feasible.
Practical advice
The key inputs to a power analysis are:
- The minimum effect size that would be scientifically meaningful.
- The variance components, estimated from pilot data or prior literature.
- The planned design (number of groups, observations per group).
If prior data are available, fit a model to them and use the estimated parameters as the basis for makeLmer. The fixed effect can then be set to a value smaller than the estimated one to represent a conservative lower bound on the true effect.
Power analysis for multilevel designs should always consider the number of groups as well as the number of observations per group. In many designs, power is more sensitive to the number of groups than to the number of observations per group, because the degrees of freedom for testing fixed effects depends primarily on the number of independent units at the highest relevant level.