Amortized Bayesian Inference

Introduction

Šimon Kucharský

Artificial Intelligence has two goals. First, AI is directed toward getting computers to be smart and do smart things so that human beings don’t have to do them. And second, AI […] is also directed at using computers to simulate human beings, so that we can find out how humans work.

Simon (1983, p. 27), van Rooij et al. (2024)

Parameter estimation

What are the values of the model parameters \(\theta\), given observed data \(x\)?

Bayes’ theorem

\[ \begin{aligned} p(\theta \mid x) & = \frac{p(\theta, x)}{p(x)} \\ & = \frac{p(\theta) \times p(x \mid \theta)}{\int p(\theta) \times p(x \mid \theta) d\theta} \end{aligned} \]

Marginal likelihood

\[ p(x) = \int p(\theta) \times p(x \mid \theta) d\theta \]

difficult to evaluate
often intractable

Classic alternatives

Approximate \(p(\theta \mid x)\)
- Markov Chain Monte Carlo (MCMC): \(\theta \propto p(\theta) \times p(x \mid \theta)\)
Obtain point estimates:
- Maximum likelihood: \(\hat{\theta} = \operatorname*{argmax}_{\theta} p(x \mid \theta)\)
- Maximum aposteriori: \(\hat{\theta} = \operatorname*{argmax}_{\theta} p(\theta) \times p(x \mid \theta)\)
None of the methods require \(p(x)\)
But all require evaluating \(p(x \mid \theta)\)

Simulation-based inference (SBI)

“Likelihood-free”
Cannot evaluate \(p(x \mid \theta)\) or \(p(\theta)\)
Approximate using sampling

Examples

Surrogate Likelihood
Rejection Sampling
Approximate Bayesian Computation

Modern overview: Cranmer et al. (2020)

Surrogate likelihood

For a given parameter value \(\theta\), simulate many samples of \(x\)
Estimate the density \(p(x\mid\theta)\) (e.g., kernel density estimation)
- Used for approximate MLE/MAP, or within MCMC

Rejection sampling

\[ \begin{aligned} p(\theta, x) & = p(\theta) p(x \mid \theta)\\ \end{aligned} \]

Rejection sampling

\[ \begin{aligned} p(\theta, x) & = p(\theta) p(x \mid \theta)\\ \end{aligned} \]

Rejection sampling

\[ \begin{aligned} \theta^{(s)} & \sim \text{Beta}(1, 1)\\ x^{(s)} &\sim \text{Binomial}(\theta^{(s)}, 10)\\ \\[0.1em] \theta \mid x^{\text{obs}}=7 & \approx \text{Samples from } \theta^{(s)} \text{ where } x^{(s)} = 7 \end{aligned} \]

Python

prior = np.random.beta(1, 1, size=5000)
x = np.random.binomial(n=10, p=prior)

observed = 7
posterior = prior[x == observed]

Approximate Bayesian Computation (ABC)

Generalization of rejection sampling
Given a sampled parameter value \(\theta\), generate a data set \(x\)
Compare the simulated data set to observed \(x^{\text{obs}}\)
Retain the parameter if the data sets are not too dissimilar

\[ \rho(s(x), s(x^{\text{obs}})) \leq \epsilon \]

Issues

Curse of dimensionality
Computationally expensive
Handcrafted summary statistics

Neural estimation

Approximate using generative neural networks
- Deep learning architecture that approximates a probability distribution

Neural likelihood estimation (NLE)

Learn \(p(x \mid \theta)\)
Surrogate likelihood

Neural posterior estimation (NPE)

Learn \(p(\theta \mid x)\)
Obtain posterior directly

Amortized Bayesian Inference (ABI)

Generative neural networks
- Produce a distribution \(q(\theta \mid x)\)
Train them on simulated pairs \((\theta^{(s)}, x^{(s)}) \sim p(\theta, x)\)
- Learn \(q(\theta \mid x) \approx p(\theta \mid x)\)
Once trained, can be used on observed data \(x^\text{obs}\)
- \(q(\theta \mid x^\text{obs}) \approx p(\theta \mid x^\text{obs})\)

Amortized Bayesian Inference (ABI)

Pay the cost of inference upfront during training, receive benefits later

Training

Train neural networks
Simulated data and parameters
Learn the maping between data and parameters
Slow, resource consuming process

Inference

Apply pretrained networks
Observed data
Posterior distribution of parameters
Fast, cheap process

Amortized Bayesian Inference (ABI)

Using deep learning generative neural networks to make Bayesian inference.

Advantages

Fast inference
Simulation based – intractable models

Disatvantages

Need for training
Simulation based – weaker guarantees

References

Cranmer, K., Brehmer, J., & Louppe, G. (2020). The frontier of simulation-based inference. Proceedings of the National Academy of Sciences, 117(48), 30055–30062.

Simon, H. A. (1983). Why should machines learn? In Machine learning (pp. 25–37). Elsevier.

van Rooij, I., Guest, O., Adolfi, F., Haan, R. de, Kolokolova, A., & Rich, P. (2024). Reclaiming AI as a theoretical tool for cognitive science. Computational Brain & Behavior, 7(4), 616–636.