Inference in the Bernoulli Model

Mark Andrews

The posterior distribution

With \(m = 139\) successes, \(n = 250\) trials, and a \(\text{Beta}(3, 5)\) prior:

\[\theta \mid m, n \sim \text{Beta}(142, 116)\]

bernoulli_posterior_plot(n = 250, m = 139, alpha = 3, beta = 5)

The uniform prior

bernoulli_posterior_plot(n = 250, m = 139, alpha = 1, beta = 1)
  • With 250 observations, prior has little influence
  • Posterior \(\approx\) normalised likelihood

Point estimates

  • MAP (maximum a posteriori): mode of the posterior
  • Posterior mean: mean of the posterior
  • Both converge to the MLE \(m/n\) as data dominate

\[ \begin{align*} \langle \theta \rangle &= \frac{m + \alpha}{n + \alpha + \beta} \\[6pt] \mathrm{V}(\theta) &= \frac{(m + \alpha)(n - m + \beta)}{(n + \alpha + \beta)^2\,(n + \alpha + \beta + 1)} \\[6pt] \mathrm{mode}(\theta) &= \frac{m + \alpha - 1}{n + \alpha + \beta - 2} \end{align*} \]

bernoulli_posterior_summary(n, m, alpha = 3, beta = 5)
$mean
[1] 0.5503876

$var
[1] 0.0009554482

$sd
[1] 0.03091033

$mode
[1] 0.5507812

$qi
[1] 0.4894851 0.6105500

Credible intervals

A 95% credible interval contains 95% of the posterior probability. This is a direct statement: we assign 95% probability to \(\theta\) lying in the interval.

Compare with a frequentist confidence interval: it refers to the long-run coverage of the procedure, not to the probability that the parameter is in any given interval.

Quantile-based intervals

qbeta(0.025, m + 1, n - m + 1)
[1] 0.493961
qbeta(0.975, m + 1, n - m + 1)
[1] 0.6163147

Takes the 2.5th and 97.5th percentiles of the posterior.

Highest posterior density intervals

  • The shortest interval containing the specified mass
  • Equal-tailed and HPD intervals coincide for symmetric posteriors
  • For skewed or multimodal posteriors they differ
get_beta_hpd(m + 1, n - m + 1)
$lb
[1] 0.4943

$ub
[1] 0.6166

$p_star
[1] 1.887

Posterior intervals illustrated

[Diagram: bimodal posterior with tail regions shaded at 2.5%, two upward arrows marking the 2.5th and 97.5th percentile boundaries]

Monte Carlo as a preview

If we can sample from a distribution, we can approximate any quantity.

x <- rnorm(n = 1e6, mean = 100, sd = 15)
mean(x <= 130) # Monte Carlo
[1] 0.976867
pnorm(130, 100, 15) # exact
[1] 0.9772499

With one million samples the approximation is accurate to four decimal places.

Why this matters for MCMC

  • Most realistic posteriors have no closed form
  • MCMC generates samples from the posterior
  • Those samples are used exactly like the Monte Carlo samples above
  • Mean, intervals, probabilities: all computed from the samples

Summary

  • The posterior is the complete answer to “what do the data tell us about \(\theta\)?”
  • Point estimates (MAP, mean) summarise its centre
  • Credible intervals summarise its spread as direct probability statements
  • Monte Carlo sampling is the bridge from analytical to numerical Bayesian inference