The negative binomial model is an alternative to Poisson regression for overdispersed count data, where the variance substantially exceeds the mean. This guide covers the negative binomial distribution, overdispersion diagnostics, fitting with glm.nb from the MASS package, and comparing Poisson and negative binomial fits.
Overdispersion in count data
The Poisson distribution has the property that its mean and variance are both equal to \(\lambda\). Real count data frequently violate this: the variance can be considerably larger than the mean, a condition called overdispersion. Fitting a Poisson model to overdispersed data leads to standard errors that are too small, inflating test statistics and producing overconfident inferences.
The biochemists dataset records the number of academic publications by PhD students over three years.
Rows: 915 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): gender, married
dbl (4): publications, children, prestige, mentor
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
pubs <- biochem_df$publicationsc(mean =mean(pubs), variance =var(pubs), ratio =var(pubs) /mean(pubs))
mean variance ratio
1.692896 3.709742 2.191358
A variance-to-mean ratio substantially greater than 1 signals overdispersion.
The negative binomial distribution
The negative binomial distribution can be derived as a Poisson distribution whose rate parameter \(\lambda\) itself varies across observations according to a gamma distribution. Marginalising over this gamma mixing yields a distribution with mean \(\mu\) and variance \(\mu + \mu^2 / r\), where \(r > 0\) is a dispersion parameter. As \(r \to \infty\), the extra variance term vanishes and the distribution approaches Poisson.
Negative binomial regression
In negative binomial regression, each count \(y_i\) is modelled as
The log link and coefficient interpretation are identical to Poisson regression. The extra parameter \(r\) is estimated from the data and accounts for overdispersion.
Fitting with glm.nb
M_13 <-glm.nb(publications ~ prestige, data = biochem_df)summary(M_13)
Call:
glm.nb(formula = publications ~ prestige, data = biochem_df,
init.theta = 1.731467472, link = log)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.25820 0.12018 2.148 0.0317 *
prestige 0.08531 0.03648 2.338 0.0194 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Negative Binomial(1.7315) family taken to be 1)
Null deviance: 1007.1 on 914 degrees of freedom
Residual deviance: 1001.7 on 913 degrees of freedom
AIC: 3220.4
Number of Fisher Scoring iterations: 1
Theta: 1.731
Std. Err.: 0.181
2 x log-likelihood: -3214.415
The Theta parameter in the output is the estimated dispersion \(r\). A larger value indicates less overdispersion (closer to Poisson).
Compare standard errors between Poisson and negative binomial models:
Likelihood ratio tests of Negative Binomial Models
Response: publications
Model theta Resid. df
1 gender + I(children > 0) + mentor 2.218397 911
2 gender + married + I(children > 0) + prestige + mentor 2.239056 909
2 x log-lik. Test df LR stat. Pr(Chi)
1 -3128.777
2 -3125.827 1 vs 2 2 2.949479 0.2288383
Poisson and negative binomial models are not nested in the usual sense (they have different dispersion assumptions), but AIC can still be used to compare them as candidate models for the same data.