Introduction to Generalized Linear Models with R
A Workshop
This workshop provides a comprehensive practical and theoretical introduction to generalized linear models using R. Generalized linear models extend the normal linear model to situations where the outcome variable is binary, ordinal, categorical, or a count, and they form the backbone of applied statistical modelling across the natural and social sciences. The topics below are modular: a given delivery will cover a selection depending on length, audience, and emphasis.
Logistic Regression Models
The General Linear Model
A thorough understanding of the normal general linear model is the foundation for everything that follows. This guide covers the model’s mathematical structure, the interpretation of continuous and categorical predictors, and model comparison using deviance and likelihood.
Binary Logistic Regression
Binary logistic regression is the first and most widely used generalized linear model, for situations where the outcome takes only two values. This guide covers the assumed model, the logit link function, odds and odds ratios, prediction, and deviance-based model comparison.
Ordinal Logistic Regression
The cumulative logit model extends binary logistic regression to outcomes with more than two ordered categories. This guide develops the latent variable formulation of the model, explains the role of thresholds, and demonstrates fitting with polr and clm.
Categorical Logistic Regression
Categorical logistic regression, also known as multinomial logistic regression, is for outcomes that take more than two categorically distinct values with no implied ordering. This guide covers the model structure, the baseline-category logit parameterisation, and fitting with multinom.
Count Regression Models
Poisson Regression
Poisson regression is the standard model for unbounded count data, where the outcome variable represents the number of times an event has occurred. This guide covers the Poisson distribution, the log link function, coefficient interpretation as multiplicative rate ratios, exposure and offset terms, and model comparisons.
Binomial Logistic Regression
When count data have a known upper bound — for example the number of correct answers out of a fixed number of questions — the binomial logistic regression model is more appropriate than Poisson regression. This guide covers the binomial distribution, the relationship to binary logistic regression, and practical fitting with glm.
Negative Binomial Regression
The negative binomial model is an alternative to Poisson regression for overdispersed count data, where the variance substantially exceeds the mean. This guide covers the negative binomial distribution, its relationship to the Poisson, overdispersion diagnostics, and fitting with glm.nb.
Zero-Inflated Models
Zero-inflated models are for count data that contain more zeros than a Poisson or negative binomial model can accommodate. They combine a point mass at zero with an ordinary count component, effectively modelling two processes: one that produces only zeros and one that produces counts from zero upward.
Hurdle Models
Hurdle models are a closely related alternative to zero-inflated models. They separate the data into a binary part (did the count exceed zero or not?) and a truncated count part (given that it did, how large was the count?), and they are often easier to interpret when the two processes are conceptually distinct.