Introduction to Bayesian Inference

Mark Andrews

What is Bayesian data analysis?

An approach to statistical inference based on Bayes’ theorem
Not a specialised technique but an alternative general framework
Sits alongside, not above, classical frequentist statistics
The two approaches differ in how probability is defined and used
But in practice, they are defined by two alternative approaches to statistical inference

Probability is defined as long-run relative frequency
Parameters are fixed but unknown constants, hence Bayesian methods are “wholly rejected: Bayesian methods were ..founded upon an error, and must be wholly rejected because (i)nferences respecting populations, from which known samples have been drawn, cannot be expressed in terms of probability. Fisher (1925).
Inference is based non sampling distributions: if the true parameter value of \(\theta\) were \(\theta_0\), what is the probability distribution of the observed data?
Leads to p-values, confidence intervals, rejection regions

Probability is a means to quantify uncertainty
Parameters are quantities about which we are uncertain
Uncertainty is always represented by a probability distribution
Before data: unknowns have a prior distribution; after data: they have a posterior distribution, calculated using Bayes’ rule

\[y_1, y_2, \ldots, y_n\]

\[\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_n\]

A model specifies a distribution for \(y_i\), conditional on \(\mathbf{x}_i\). The model is parameterized by fixed and unknown parameters \(\theta\).

\[y_i \sim \mathrm{N}(\mu_i, \sigma^2)\]

\[\mu_i = \beta_0 + \sum_{k=1}^{K} \beta_k x_{ki}\]

The “sampling” part: \(y_i \sim \mathrm{N}(\mu_i, \sigma^2)\)
The “structural” part: the linear predictor
Bayesian inference begins by assigning or assuming prior distributions for \(\beta_0, \beta_1, \ldots, \beta_K, \sigma^2\)

\[p(\theta \mid \text{data}) = \frac{p(\text{data} \mid \theta)\, p(\theta)}{\int p(\text{data} \mid \theta)\, p(\theta)\, d\theta}\]

A complete probability distribution over the parameters
The posterior tells you exactly what you can say about the unknowns given what you know (e.g. data) or have assumed (e.g. model, prior distribution).
Summarise the posterior however you wish: mean, median, any interval
Posterior (aka credible) intervals are direct probability statements about the parameters
No need to condition on a hypothetical null hypothesis

Small samples: prior information stabilises estimation
Complex models: MCMC is a general purpose method for inference
Quantifying uncertainty: the posterior is exactly what you need and we can use the full repertoire of probability for any inference and any prediction
This includes model comparison: e.g., posterior model probabilities

The prior is just another modelling assumption. It represents uncertainty about parameters before seeing data
Can encode genuine prior knowledge or just weak regularisation
Its influence diminishes as more data are collected
With enough data, different reasonable priors give essentially the same posterior