data("admit", package = "pscl")
admit <- as_tibble(admit)Categorical Logistic Regression
Categorical logistic regression, also called multinomial logistic regression, is for outcome variables that take more than two categorically distinct values with no implied ordering. This guide covers the baseline-category logit model, the softmax representation, fitting with multinom from the nnet package, and interpreting and predicting from the fitted model.
Polychotomous outcomes
When the outcome has more than two values and those values are not ordered, neither binary nor ordinal logistic regression applies. Categorical logistic regression — also called multinomial logistic regression — handles this case. It extends binary logistic regression by modelling the log odds of each category relative to a chosen reference category.
The model
For an outcome \(y_i \in \{1, 2, \ldots, L\}\) with predictors \(\vec{x}_i\), the categorical logistic regression model specifies
\[ \log\!\left(\frac{\Pr(y_i = l)}{\Pr(y_i = 1)}\right) = \beta_{l0} + \sum_k \beta_{lk} x_{ki}, \quad l = 2, \ldots, L, \]
with category 1 as the reference. This gives \(L - 1\) sets of coefficients, one for each non-reference category.
The implied probabilities can be written in softmax form:
\[ \Pr(y_i = l) = \frac{e^{z_{li}}}{1 + \sum_{l^\prime=2}^L e^{z_{l^\prime i}}}, \]
where \(z_{li} = \beta_{l0} + \sum_k \beta_{lk} x_{ki}\) and \(z_{1i} = 0\) by convention.
Fitting with multinom
We continue with the admit dataset from pscl.
Fit a categorical logistic regression with multinom from nnet:
M_9 <- multinom(score ~ gre.quant, data = admit, trace = FALSE)
summary(M_9)Call:
multinom(formula = score ~ gre.quant, data = admit, trace = FALSE)
Coefficients:
(Intercept) gre.quant
2 -5.866792 0.009516172
3 -5.721243 0.005478682
4 -11.253939 0.017934443
5 -19.752601 0.028752654
Std. Errors:
(Intercept) gre.quant
2 1.4767967480 0.0022886144
3 0.0396115509 0.0011634076
4 0.8929482776 0.0012777258
5 0.0002966432 0.0005548514
Residual Deviance: 256.2873
AIC: 272.2873
The output gives one row of coefficients per non-reference category.
Predicted probabilities
add_predictions with type = "prob" returns a matrix of predicted probabilities:
admit_new <- tibble(gre.quant = seq(300, 800, by = 100))
add_predictions(admit_new, M_9, type = "prob")# A tibble: 6 × 2
gre.quant pred[,"1"] [,"2"] [,"3"] [,"4"] [,"5"]
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 300 0.935 0.0460 0.0159 0.00263 0.0000138
2 400 0.852 0.109 0.0250 0.0144 0.000222
3 500 0.673 0.222 0.0341 0.0683 0.00311
4 600 0.380 0.324 0.0333 0.232 0.0311
5 700 0.118 0.261 0.0179 0.432 0.171
6 800 0.0182 0.105 0.00478 0.402 0.470
Manual calculation using the softmax
The softmax calculation clarifies what the model is doing. For gre.quant = 600:
z <- coef(M_9) %*% c(1, 600) # linear predictors for non-reference categories
z_all <- c(0, z) # include reference category (log odds = 0)
round(exp(z_all) / sum(exp(z_all)), 3)[1] 0.380 0.324 0.033 0.232 0.031
These are the predicted probabilities for each category when gre.quant = 600.
Relationship to binary logistic regression
When \(L = 2\), categorical logistic regression reduces to binary logistic regression. The single coefficient set gives the log odds of category 2 versus category 1, which is exactly the binary logistic regression model. The multinom function and glm with binomial family will give the same estimates in this case.