Ordinal Logistic Regression

Author

Mark Andrews

Abstract

Ordinal outcome variables have values that can be ordered but are not separated by equal intervals. This guide develops ordinal logistic regression through a latent variable argument, explains the cumulative logit formulation and its threshold parameters, and demonstrates fitting with both polr from the MASS package and clm from the ordinal package.

Ordinal outcome variables

An ordinal variable has values that can be ranked but for which the gaps between adjacent values are not necessarily equal. Examples include Likert-scale responses (“strongly disagree” through “strongly agree”), pain ratings, or educational attainment categories. Treating such a variable as numeric and fitting a normal linear model implies that the intervals between levels are equal, which is rarely defensible. Treating it as an unordered categorical variable ignores the ordering, discarding information. Ordinal logistic regression occupies the sensible middle ground.

The latent variable formulation

Binary logistic regression can be understood through a latent variable argument. Suppose there is an unobserved continuous variable \(z_i\) such that

\[ z_i \sim \mathrm{Logistic}\!\left(\beta_0 + \sum_k \beta_k x_{ki},\ 1\right), \]

and we observe \(y_i = 1\) if \(z_i \geq 0\), and \(y_i = 0\) otherwise. This is mathematically equivalent to the standard logistic regression formulation.

The extension to ordinal outcomes is straightforward. For \(L\) ordered values, we introduce \(L - 1\) cutpoints \(\zeta_1 < \zeta_2 < \cdots < \zeta_{L-1}\) and define

\[ y_i = l \quad \text{if} \quad \zeta_{l-1} \leq z_i < \zeta_l, \]

where \(\zeta_0 = -\infty\) and \(\zeta_L = \infty\). The regression coefficients \(\beta_k\) shift the centre of the latent distribution; the cutpoints divide the real line into \(L\) regions corresponding to the \(L\) outcome categories.

The cumulative logit model

An equivalent and more common presentation works directly with cumulative probabilities. The cumulative logit model specifies

\[ \log\!\left(\frac{\Pr(y_i \leq l)}{\Pr(y_i > l)}\right) = \zeta_l - \sum_k \beta_k x_{ki}, \quad l = 1, \ldots, L-1. \]

The same \(\beta\) coefficients appear in each equation (a proportional odds assumption), but the intercepts differ. A positive \(\beta_k\) means that increasing \(x_k\) is associated with higher values of \(y\).

Fitting with polr

The admit dataset from pscl contains GRE scores and admission decisions for graduate school applicants. The outcome score is an ordered factor.

data("admit", package = "pscl")
admit <- as_tibble(admit)
admit
# A tibble: 106 × 6
   score gre.quant gre.verbal    ap    pt female
   <ord>     <int>      <int> <int> <int>  <int>
 1 2           630        630     0     0      1
 2 1           520        490     0     0      1
 3 4           670        400     0     0      0
 4 1           600        560     0     0      0
 5 1           620        570     0     0      0
 6 4           730        670     0     0      0
 7 1           640        590     0     0      1
 8 4           470        610     1     0      1
 9 2           550        500     1     0      0
10 5           720        710     0     0      1
# ℹ 96 more rows

Fit an ordinal logistic regression with polr from MASS:

M_8 <- polr(score ~ gre.quant, data = admit)
summary(M_8)

Re-fitting to get Hessian
Call:
polr(formula = score ~ gre.quant, data = admit)

Coefficients:
            Value Std. Error t value
gre.quant 0.01439   0.002509   5.733

Intercepts:
    Value   Std. Error t value
1|2  8.0467  1.6390     4.9095
2|3  9.5379  1.7165     5.5566
3|4  9.6414  1.7214     5.6010
4|5 11.6592  1.8117     6.4356

Residual Deviance: 256.5929 
AIC: 266.5929 

The Intercepts in the polr summary are the cutpoints \(\zeta_l\). The coefficient for gre.quant is on the log-cumulative-odds scale.

Predicted probabilities

add_predictions with type = "prob" returns the predicted probability for each outcome category:

admit_new <- tibble(gre.quant = seq(300, 800, by = 100))
add_predictions(admit_new, M_8, type = "prob")
# A tibble: 6 × 2
  gre.quant pred[,"1"] [,"2"]   [,"3"]  [,"4"]   [,"5"]
      <dbl>      <dbl>  <dbl>    <dbl>   <dbl>    <dbl>
1       300     0.977  0.0180 0.000525 0.00420 0.000647
2       400     0.908  0.0696 0.00214  0.0174  0.00272 
3       500     0.701  0.211  0.00792  0.0682  0.0114  
4       600     0.358  0.354  0.0207   0.221   0.0462  
5       700     0.117  0.253  0.0244   0.436   0.170   
6       800     0.0304 0.0918 0.0115   0.404   0.463   

We can also recover predictions by hand using plogis, which clarifies the model mechanics:

beta <- coef(M_8)
zeta <- M_8$zeta

# For gre.quant = 600, the latent distribution centre is:
centre <- beta * 600

# Cumulative probability of each threshold:
plogis(zeta, location = centre)
      1|2       2|3       3|4       4|5 
0.3576317 0.7120825 0.7328235 0.9537689 

Fitting with clm

The ordinal package provides clm, which fits the same cumulative logit model but with more flexible options including non-proportional-odds extensions.

admit <- mutate(admit, score = factor(score, ordered = TRUE))
M_8b <- clm(score ~ gre.quant, data = admit)
Warning: (3) Model is nearly unidentifiable: large eigenvalue ratio
 - Rescale variables? 
In addition: Absolute and relative convergence criteria were met
summary(M_8b)
formula: score ~ gre.quant
data:    admit

 link  threshold nobs logLik  AIC    niter max.grad cond.H 
 logit flexible  106  -128.30 266.59 6(2)  4.13e-07 1.6e+08

Coefficients:
          Estimate Std. Error z value Pr(>|z|)    
gre.quant 0.014386   0.002471   5.823 5.78e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Threshold coefficients:
    Estimate Std. Error z value
1|2    8.046      1.622   4.961
2|3    9.537      1.703   5.600
3|4    9.640      1.708   5.645
4|5   11.658      1.801   6.473

The results are equivalent to polr for the standard proportional odds model.