Binomial Logistic Regression

Mark Andrews

Bounded count data

  • Poisson regression handles unbounded counts
  • But some counts have a natural maximum: number correct out of \(n\) questions, number of goals out of \(n\) shots
  • The binomial distribution is the appropriate likelihood

The binomial distribution

If \(y\) is the number of successes in \(n\) independent trials, each with success probability \(\theta\):

\[ \Pr(y = k \mid n, \theta) = \binom{n}{k} \theta^k (1-\theta)^{n-k} \]

  • Mean \(= n\theta\), Variance \(= n\theta(1-\theta)\)
  • When \(n = 1\), reduces to the Bernoulli distribution

Binomial logistic regression

\[ \begin{aligned} y_i &\sim \mathrm{Binomial}(n_i, \theta_i)\\ \mathrm{logit}(\theta_i) &= \beta_0 + \sum_k \beta_k x_{ki} \end{aligned} \]

  • Same logit link as binary logistic regression
  • \(n_i\) can vary across observations

Fitting with glm

Golf putting data: successes and attempts at each distance

golf_df <- read_csv("data/golf_putts.csv") |>
  mutate(failure = attempts - success)

M_bin <- glm(cbind(success, failure) ~ distance,
             family = binomial(link = "logit"),
             data = golf_df)

cbind(success, failure) specifies the response as counts of outcomes

Results

summary(M_bin)
  • A negative coefficient for distance means the log odds of success decrease as distance increases
  • Exactly as expected for putting

Predicted probabilities

golf_new <- tibble(distance = seq(1, 20))
add_predictions(golf_new, M_bin, type = "response")

The predicted probability of a successful putt falls with distance

Relationship to binary logistic regression

  • When \(n_i = 1\) for all observations, cbind(success, failure) has rows \((1, 0)\) or \((0, 1)\)
  • The model is identical to glm(y ~ x, family = binomial) with a binary outcome
  • Binomial logistic regression is simply the more general form

Summary

  • Binomial logistic regression applies when count outcomes have a known maximum \(n\)
  • The logit link, coefficient interpretation, and inference are the same as in binary logistic regression
  • Specify the response with cbind(successes, failures) in R