with BayesFlow
We want to approximate \(p(\theta \mid x)\) or \(p(x \mid \theta)\)
\[ p(\theta \mid x) = \frac{p(\theta) \times p(x \mid \theta)}{p(x)} \]
Use generative neural networks
Goal:
\[p(\theta \mid x) \approx q_\phi(\theta \mid x)\]
\(\rightarrow\) Must be of fixed dimensions
\(\rightarrow\) Summary statistics
\(\rightarrow\) Condition the inference network on the output of the summary network
\[p(\theta \mid x) \approx q_\phi(\theta \mid h_\psi(x))\]
\[ \begin{aligned} \hat{\phi}, \hat{\psi} = \operatorname*{argmin}_{\phi, \psi} & \mathbb{E}_{x \sim p(x)} \mathbb{KL}\big[p(\theta \mid x) || q_\phi(\theta \mid h_\psi(x))\big] = \\ = \operatorname*{argmin}_{\phi, \psi} & \mathbb{E}_{x \sim p(x)} \mathbb{E}_{\theta \sim p(\theta \mid x)} \log \frac{p(\theta \mid x)}{q_\phi(\theta \mid h_\psi(x))} \propto \\ \propto \operatorname*{argmin}_{\phi, \psi} & - \mathbb{E}_{(x, \theta) \sim p(x, \theta)} \log q_\phi(\theta \mid h_\psi(x)) \approx\\ \approx \operatorname*{argmin}_{\phi, \psi} & - \frac{1}{S}\sum_{s=1}^S \log q_\phi(\theta^{(s)} \mid h_\psi(x^{(s)})) \end{aligned} \]
\(\rightarrow\) must be able to generate samples \((x^{(s)}, \theta^{(s)}) \sim p(x, \theta)\)
\(\rightarrow\) Amortized inference: Pay upfront the cost of inference during training, making subsequent inference effective
For more info, see Radev et al. (2020)
bayesflow
TensorFlow
pip install git+https://github.com/bayesflow-org/bayesflow@stable-legacy
keras
TensorFlow
, JAX
, or PyTorch
as a backendpip install git+https://github.com/bayesflow-org/bayesflow@main
bayesflow
expects “batched” simulationsbayesflow
expects “batched” simulationsmake_simulator
: Convenient interface for “auto batching”Strategies:
Python
def context():
return dict(n = np.random.randint(10, 101))
def prior():
return dict(mu = np.random.normal(0, 1))
def likelihood(mu, n):
observed = np.zeros(0)
observed[:n] = 1
x = np.zeros(100)
x[:n] = np.random.normal(mu, 1, size=n)
return dict(observed=observed, x=x)
simulator = bf.make_simulator([context, prior, likelihood])
simulator.sample(10)
Various options available, see the Two Moons Example.
Reflect symmetries in the data
Main keywords:
"inference_variables"
: What are the variables that the inference network should learn about?
"inference_conditions"
: What are the variables that the inference network should be directly conditioned on?
"summary_variables"
: What are the variables that are supposed to be passed into the summary network?
Python
.standardize
.sqrt
, .log
.constrain
.as_set
, .as_time_series
.broadcast
.one_hot
In principle, training as any other model in keras
approximator.fit
Define optimizer (e.g., keras.optimizers.Adam
)
Define training budget and regime
Compile and train the model until convergence
BasicWorkflow
makes fitting easier, comes with some predefined reasonable settings
workflow.fit_online
For general discussion, see Schad et al. (2021).
\[ p(\theta) = \int \int p(\theta \mid \tilde{y}) \underbrace{p(\tilde{y} \mid \tilde{\theta}) p(\tilde{\theta})}_{\text{Prior predictives}} d\tilde{\theta} d \tilde{y} \]
\[ \begin{aligned} \theta^{\text{sim}} & \sim p(\theta) \\ y^{\text{sim}} &\sim p(y \mid \theta^{\text{sim}}) \end{aligned} \]
\[ \begin{aligned} (\theta^{\text{sim}}, y^{\text{sim}}) & \sim p(\theta, y) \\ \theta^{\text{sim}} &\sim p(\theta \mid y^{\text{sim}}) \end{aligned} \]
\[ \begin{aligned} \theta_1, \dots, \theta_M & \sim q(\theta \mid y^{\text{sim}}) \\ \end{aligned} \]
If \(q(\theta \mid y^{\text{sim}}) = p(\theta \mid y^{\text{sim}})\), then the rank statistic of \(\theta^{\text{sim}}\) is uniform.
\[ z = \frac{\text{mean}(\theta_i) - \theta^{\text{sim}}}{\text{sd}(\theta_i)} \]
\[ \text{contraction} = 1 - \frac{\text{sd}(\theta_i)}{\text{sd}(\theta^{\text{sim}})} \]
\(\rightarrow\) Approximator may not be trusted
Schmitt et al. (2023)
In addition to learning \(p(\theta \mid x)\):
We do these things not because they are easy, but because we thought they were going to be easy.
— misquote of John F. Kennedy, unknown author
Resources:
Amortized Bayesian Inference