Binary Logistic Regression

Mark Andrews

The binary outcome problem

What if \(y_i \in \{0, 1\}\)?
Modelling as normally distributed is severe model misspecification
Predicted values can fall outside \([0, 1]\)
Variance cannot be constant: \(\mathrm{Var}(\text{Bernoulli}(\theta)) = \theta(1-\theta)\)

The logistic regression model

\[ \begin{aligned} y_i &\sim \mathrm{Bernoulli}(\theta_i)\\ \mathrm{logit}(\theta_i) &= \beta_0 + \sum_{k=1}^K \beta_k x_{ki} \end{aligned} \]

where \(\mathrm{logit}(\theta) = \log\!\left(\dfrac{\theta}{1-\theta}\right)\)

Odds and log odds

The odds of an event with probability \(\theta\) are \(\dfrac{\theta}{1-\theta}\)
Odds \(> 1\) means the event is more likely than not
The log odds maps probability to the real line

\[ \theta \in (0,1) \xrightarrow{\text{logit}} \log\!\left(\frac{\theta}{1-\theta}\right) \in (-\infty, \infty) \]

The inverse logit

The logit has an inverse — the ilogit or sigmoid function:

\[ \mathrm{ilogit}(x) = \frac{1}{1 + e^{-x}} \]

Maps any real number back to \((0, 1)\)
In R: plogis(x)

Fitting with glm

affairs_df <- read_csv("data/affairs.csv") |>
  mutate(had_affair = affairs > 0)

M_4 <- glm(had_affair ~ yearsmarried,
           family = binomial(link = "logit"),
           data = affairs_df)
summary(M_4)

Interpreting coefficients as odds ratios

Coefficients are on the log-odds scale
Exponentiating gives the odds ratio — the multiplicative factor applied to the odds for a one-unit increase in the predictor:

\[ e^\beta = \frac{p/(1-p)}{q/(1-q)} \]

exp(coef(M_4))
exp(confint.default(M_4))

Predicted probabilities

affairs_new <- tibble(yearsmarried = c(5, 10, 20))

# log odds
predict(M_4, newdata = affairs_new)

# probabilities
predict(M_4, newdata = affairs_new, type = "response")

Deviance

The deviance of a model is defined relative to the saturated model — the model with one parameter per observation, fitting perfectly:

\[ D = 2\left[\log L(\text{saturated}) - \log L(\hat{\beta})\right] \]

For binary outcomes, the saturated model assigns probability 1 to every observed outcome, so \(\log L(\text{saturated}) = 0\). Therefore:

\[ D = -2\log L(\hat{\beta}) \]

\(D \geq 0\); smaller means better fit
In R: deviance(M)

Model comparison

For nested models \(\mathcal{M}_0 \subset \mathcal{M}_1\), the difference in deviances is the log likelihood ratio statistic:

\[ D_0 - D_1 = 2\log\frac{L(\hat{\beta}_1)}{L(\hat{\beta}_0)} = \Lambda \;\overset{\cdot}{\sim}\; \chi^2(q) \]

where \(q = K_1 - K_0\). This is the direct counterpart of the F-test in linear models — here we compare \(\Lambda\) against a \(\chi^2\) distribution rather than transforming it into an F statistic.

anova(M_4, M_5, test = "Chisq")
AIC(M_4, M_5)

Summary

Binary logistic regression replaces the normal distribution with Bernoulli
The logit link maps probability to the real line
Coefficients are log odds ratios; exponentiate for odds ratios
Deviance is \(-2\log L\); nested models are compared using the log likelihood ratio test against \(\chi^2\)