Inference in the Bernoulli Model

Mark Andrews

The posterior distribution

With \(m = 139\) successes, \(n = 250\) trials, and a \(\text{Beta}(3, 5)\) prior:

\[\theta \mid m, n \sim \text{Beta}(142, 116)\]

bernoulli_posterior_plot(n = 250, m = 139, alpha = 3, beta = 5)

The uniform prior

bernoulli_posterior_plot(n = 250, m = 139, alpha = 1, beta = 1)

With 250 observations, prior has little influence
Posterior \(\approx\) normalised likelihood

Point estimates

MAP (maximum a posteriori): mode of the posterior
Posterior mean: mean of the posterior
Both converge to the MLE \(m/n\) as data dominate

\[ \begin{align*} \langle \theta \rangle &= \frac{m + \alpha}{n + \alpha + \beta} \\[6pt] \mathrm{V}(\theta) &= \frac{(m + \alpha)(n - m + \beta)}{(n + \alpha + \beta)^2\,(n + \alpha + \beta + 1)} \\[6pt] \mathrm{mode}(\theta) &= \frac{m + \alpha - 1}{n + \alpha + \beta - 2} \end{align*} \]

bernoulli_posterior_summary(n, m, alpha = 3, beta = 5)

$mean
[1] 0.5503876

$var
[1] 0.0009554482

$sd
[1] 0.03091033

$mode
[1] 0.5507812

$qi
[1] 0.4894851 0.6105500

Credible intervals

A 95% credible interval contains 95% of the posterior probability. This is a direct statement: we assign 95% probability to \(\theta\) lying in the interval.

Compare with a frequentist confidence interval: it refers to the long-run coverage of the procedure, not to the probability that the parameter is in any given interval.

Quantile-based intervals

qbeta(0.025, m + 1, n - m + 1)

[1] 0.493961

qbeta(0.975, m + 1, n - m + 1)

[1] 0.6163147

Takes the 2.5th and 97.5th percentiles of the posterior.

Highest posterior density intervals

The shortest interval containing the specified mass
Equal-tailed and HPD intervals coincide for symmetric posteriors
For skewed or multimodal posteriors they differ

get_beta_hpd(m + 1, n - m + 1)

$lb
[1] 0.4943

$ub
[1] 0.6166

$p_star
[1] 1.887

Posterior intervals illustrated

[Diagram: bimodal posterior with tail regions shaded at 2.5%, two upward arrows marking the 2.5th and 97.5th percentile boundaries]

Monte Carlo as a preview

If we can sample from a distribution, we can approximate any quantity.

x <- rnorm(n = 1e6, mean = 100, sd = 15)
mean(x <= 130) # Monte Carlo

[1] 0.976867

pnorm(130, 100, 15) # exact

[1] 0.9772499

With one million samples the approximation is accurate to four decimal places.

Why this matters for MCMC

Most realistic posteriors have no closed form
MCMC generates samples from the posterior
Those samples are used exactly like the Monte Carlo samples above
Mean, intervals, probabilities: all computed from the samples

Summary

The posterior is the complete answer to “what do the data tell us about \(\theta\)?”
Point estimates (MAP, mean) summarise its centre
Credible intervals summarise its spread as direct probability statements
Monte Carlo sampling is the bridge from analytical to numerical Bayesian inference