library(priorexposure)
library(tidyverse)Inference in the Bernoulli Model
A complete treatment of Bayesian inference using the analytically tractable Bernoulli model. We visualise how the posterior combines prior and likelihood, compute point and interval estimates, and introduce the concepts of credible intervals and highest posterior density intervals.
Visualising the posterior
Because the beta prior is conjugate to the binomial likelihood, we can compute the posterior analytically. With \(m = 139\) successes in \(n = 250\) trials and a \(\text{Beta}(3, 5)\) prior, the posterior is \(\text{Beta}(142, 116)\). The bernoulli_posterior_plot function draws all three distributions together, making it easy to see how the data update the prior.
n <- 250
m <- 139
alpha <- 3
beta <- 5
bernoulli_posterior_plot(n, m, alpha = alpha, beta = beta)
The posterior sits between the prior and the likelihood, pulled toward the likelihood by the weight of the data. With 250 observations the data dominate: the prior has relatively little influence.
Changing to a uniform prior illustrates this directly.
bernoulli_posterior_plot(n, m, alpha = 1, beta = 1)
With a flat prior the posterior is essentially the normalised likelihood. The two plots look almost identical because 250 observations are enough to swamp any weak prior.
Point estimation
The posterior distribution contains all the information the data provide about \(\theta\). Point estimates are summaries of this distribution. The most common choices are the posterior mean and the maximum a posteriori (MAP) estimate, which is the mode of the posterior.
For a \(\text{Beta}(\alpha^*, \beta^*)\) posterior, the mean is \(\alpha^*/(\alpha^* + \beta^*)\) and the mode is \((\alpha^* - 1)/(\alpha^* + \beta^* - 2)\). Both converge to the MLE \(m/n\) as the data dominate the prior.
bernoulli_posterior_summary(n, m, alpha = 1, beta = 1)$mean
[1] 0.5555556
$var
[1] 0.000975943
$sd
[1] 0.03124009
$mode
[1] 0.556
$qi
[1] 0.4939610 0.6163147
bernoulli_posterior_summary(n, m, alpha = 3, beta = 5)$mean
[1] 0.5503876
$var
[1] 0.0009554482
$sd
[1] 0.03091033
$mode
[1] 0.5507812
$qi
[1] 0.4894851 0.6105500
Credible intervals
A posterior credible interval is a range of \(\theta\) values that contains a specified fraction of the posterior probability mass. A 95% credible interval contains 95% of the posterior, meaning we assign 95% probability to \(\theta\) lying in that range. This is a direct probability statement about the parameter, which is exactly the kind of statement frequentist confidence intervals cannot make.
Quantile-based intervals
The most straightforward credible interval takes the 2.5th and 97.5th percentiles of the posterior distribution.
qbeta(0.025, m + 1, n - m + 1) # lower bound with uniform prior[1] 0.493961
qbeta(0.975, m + 1, n - m + 1) # upper bound[1] 0.6163147
Highest posterior density intervals
The highest posterior density (HPD) interval is the shortest interval containing the specified probability mass. For a unimodal symmetric posterior the quantile and HPD intervals coincide, but for skewed or multimodal posteriors they differ. The HPD interval is generally preferred because it is the most compact region consistent with the data.
get_beta_hpd(m + 1, n - m + 1)$lb
[1] 0.4943
$ub
[1] 0.6166
$p_star
[1] 1.887
Monte Carlo as a preview of MCMC
We have been computing posterior quantities analytically because the beta posterior has a known closed form. For most realistic models no such closed form exists. The standard solution is to draw random samples from the posterior and use those samples to approximate any quantity of interest.
As a preview, we can sample from a normal distribution and use the samples to estimate probabilities exactly as we will later use MCMC samples from complex posteriors.
set.seed(41)
x <- rnorm(n = 1e6, mean = 100, sd = 15)
mean(x)[1] 100.0219
sd(x)[1] 14.99414
pnorm(130, mean = 100, sd = 15) # exact probability[1] 0.9772499
mean(x <= 130) # Monte Carlo approximation[1] 0.977204
With one million samples the approximation is accurate to four decimal places. MCMC does the same thing for posterior distributions that have no closed form: it generates samples from the posterior, and we use those samples to compute means, intervals, and any other summary we need.