with BayesFlow
We want to approximate \(p(\theta \mid x)\) or \(p(x \mid \theta)\)
\[ p(\theta \mid x) = \frac{p(\theta) \times p(x \mid \theta)}{p(x)} \]
Use generative neural networks
Goal:
\[p(\theta \mid x) \approx q_\phi(\theta \mid x)\]
\(\rightarrow\) Must be of fixed dimensions
\(\rightarrow\) Summary statistics
\(\rightarrow\) Condition the inference network on the output of the summary network
\[p(\theta \mid x) \approx q_\phi(\theta \mid h_\psi(x))\]
\[ \begin{aligned} \hat{\phi}, \hat{\psi} = \operatorname*{argmin}_{\phi, \psi} & \mathbb{E}_{x \sim p(x)} \mathbb{KL}\big[p(\theta \mid x) || q_\phi(\theta \mid h_\psi(x))\big] = \\ = \operatorname*{argmin}_{\phi, \psi} & \mathbb{E}_{x \sim p(x)} \mathbb{E}_{\theta \sim p(\theta \mid x)} \log \frac{p(\theta \mid x)}{q_\phi(\theta \mid h_\psi(x))} \propto \\ \propto \operatorname*{argmin}_{\phi, \psi} & - \mathbb{E}_{(x, \theta) \sim p(x, \theta)} \log q_\phi(\theta \mid h_\psi(x)) \approx\\ \approx \operatorname*{argmin}_{\phi, \psi} & - \frac{1}{S}\sum_{s=1}^S \log q_\phi(\theta^{(s)} \mid h_\psi(x^{(s)})) \end{aligned} \]
\(\rightarrow\) must be able to generate samples \((x^{(s)}, \theta^{(s)}) \sim p(x, \theta)\)
\(\rightarrow\) Amortized inference: Pay upfront the cost of inference during training, making subsequent inference effective
For more info, see Radev et al. (2020)
bayesflowTensorFlowpip install git+https://github.com/bayesflow-org/bayesflow@stable-legacykeras
TensorFlow, JAX, or PyTorch as a backendpip install git+https://github.com/bayesflow-org/bayesflow@mainbayesflow expects “batched” simulationsbayesflow expects “batched” simulationsmake_simulator: Convenient interface for “auto batching”Strategies:
Python
def context():
return dict(n = np.random.randint(10, 101))
def prior():
return dict(mu = np.random.normal(0, 1))
def likelihood(mu, n):
observed = np.zeros(0)
observed[:n] = 1
x = np.zeros(100)
x[:n] = np.random.normal(mu, 1, size=n)
return dict(observed=observed, x=x)
simulator = bf.make_simulator([context, prior, likelihood])
simulator.sample(10)Various options available, see the Two Moons Example.
Reflect symmetries in the data
Main keywords:
"inference_variables": What are the variables that the inference network should learn about?
"inference_conditions": What are the variables that the inference network should be directly conditioned on?
"summary_variables": What are the variables that are supposed to be passed into the summary network?
Python
.standardize.sqrt, .log.constrain.as_set, .as_time_series.broadcast.one_hotIn principle, training as any other model in keras
approximator.fitDefine optimizer (e.g., keras.optimizers.Adam)
Define training budget and regime
Compile and train the model until convergence
BasicWorkflow makes fitting easier, comes with some predefined reasonable settings
workflow.fit_onlineFor general discussion, see Schad et al. (2021).
\[ p(\theta) = \int \int p(\theta \mid \tilde{y}) \underbrace{p(\tilde{y} \mid \tilde{\theta}) p(\tilde{\theta})}_{\text{Prior predictives}} d\tilde{\theta} d \tilde{y} \]
\[ \begin{aligned} \theta^{\text{sim}} & \sim p(\theta) \\ y^{\text{sim}} &\sim p(y \mid \theta^{\text{sim}}) \end{aligned} \]
\[ \begin{aligned} (\theta^{\text{sim}}, y^{\text{sim}}) & \sim p(\theta, y) \\ \theta^{\text{sim}} &\sim p(\theta \mid y^{\text{sim}}) \end{aligned} \]
\[ \begin{aligned} \theta_1, \dots, \theta_M & \sim q(\theta \mid y^{\text{sim}}) \\ \end{aligned} \]
If \(q(\theta \mid y^{\text{sim}}) = p(\theta \mid y^{\text{sim}})\), then the rank statistic of \(\theta^{\text{sim}}\) is uniform.
\[ z = \frac{\text{mean}(\theta_i) - \theta^{\text{sim}}}{\text{sd}(\theta_i)} \]
\[ \text{contraction} = 1 - \frac{\text{sd}(\theta_i)}{\text{sd}(\theta^{\text{sim}})} \]
\(\rightarrow\) Approximator may not be trusted
Schmitt et al. (2023)
In addition to learning \(p(\theta \mid x)\):
We do these things not because they are easy, but because we thought they were going to be easy.
— misquote of John F. Kennedy, unknown author
Resources:

Amortized Bayesian Inference