Categorical Logistic Regression

Author

Mark Andrews

Abstract

Categorical logistic regression, also called multinomial logistic regression, is for outcome variables that take more than two categorically distinct values with no implied ordering. This guide covers the baseline-category logit model, the softmax representation, fitting with multinom from the nnet package, and interpreting and predicting from the fitted model.

Polychotomous outcomes

When the outcome has more than two values and those values are not ordered, neither binary nor ordinal logistic regression applies. Categorical logistic regression — also called multinomial logistic regression — handles this case. It extends binary logistic regression by modelling the log odds of each category relative to a chosen reference category.

The model

For an outcome \(y_i \in \{1, 2, \ldots, L\}\) with predictors \(\vec{x}_i\), the categorical logistic regression model specifies

\[ \log\!\left(\frac{\Pr(y_i = l)}{\Pr(y_i = 1)}\right) = \beta_{l0} + \sum_k \beta_{lk} x_{ki}, \quad l = 2, \ldots, L, \]

with category 1 as the reference. This gives \(L - 1\) sets of coefficients, one for each non-reference category.

The implied probabilities can be written in softmax form:

\[ \Pr(y_i = l) = \frac{e^{z_{li}}}{1 + \sum_{l^\prime=2}^L e^{z_{l^\prime i}}}, \]

where \(z_{li} = \beta_{l0} + \sum_k \beta_{lk} x_{ki}\) and \(z_{1i} = 0\) by convention.

Fitting with multinom

We continue with the admit dataset from pscl.

data("admit", package = "pscl")
admit <- as_tibble(admit)

Fit a categorical logistic regression with multinom from nnet:

M_9 <- multinom(score ~ gre.quant, data = admit, trace = FALSE)
summary(M_9)
Call:
multinom(formula = score ~ gre.quant, data = admit, trace = FALSE)

Coefficients:
  (Intercept)   gre.quant
2   -5.866792 0.009516172
3   -5.721243 0.005478682
4  -11.253939 0.017934443
5  -19.752601 0.028752654

Std. Errors:
   (Intercept)    gre.quant
2 1.4767967480 0.0022886144
3 0.0396115509 0.0011634076
4 0.8929482776 0.0012777258
5 0.0002966432 0.0005548514

Residual Deviance: 256.2873 
AIC: 272.2873 

The output gives one row of coefficients per non-reference category.

Predicted probabilities

add_predictions with type = "prob" returns a matrix of predicted probabilities:

admit_new <- tibble(gre.quant = seq(300, 800, by = 100))
add_predictions(admit_new, M_9, type = "prob")
# A tibble: 6 × 2
  gre.quant pred[,"1"] [,"2"]  [,"3"]  [,"4"]    [,"5"]
      <dbl>      <dbl>  <dbl>   <dbl>   <dbl>     <dbl>
1       300     0.935  0.0460 0.0159  0.00263 0.0000138
2       400     0.852  0.109  0.0250  0.0144  0.000222 
3       500     0.673  0.222  0.0341  0.0683  0.00311  
4       600     0.380  0.324  0.0333  0.232   0.0311   
5       700     0.118  0.261  0.0179  0.432   0.171    
6       800     0.0182 0.105  0.00478 0.402   0.470    

Manual calculation using the softmax

The softmax calculation clarifies what the model is doing. For gre.quant = 600:

z <- coef(M_9) %*% c(1, 600)   # linear predictors for non-reference categories
z_all <- c(0, z)                # include reference category (log odds = 0)
round(exp(z_all) / sum(exp(z_all)), 3)
[1] 0.380 0.324 0.033 0.232 0.031

These are the predicted probabilities for each category when gre.quant = 600.

Relationship to binary logistic regression

When \(L = 2\), categorical logistic regression reduces to binary logistic regression. The single coefficient set gives the log odds of category 2 versus category 1, which is exactly the binary logistic regression model. The multinom function and glm with binomial family will give the same estimates in this case.