Full Probabilistic Inference Schema
This note lays out a unified probabilistic schema for state and parameter
estimation, organised along two axes:
Two model tracks — a simulator (no internal latent variable) and an
emulator (a latent-variable model trained on simulator outputs).
Three inference regimes — exact posteriors, per-observation
variational inference, and amortized inference that generalises over
observations.
The most general generative model factorises as
p ( y , u , z , θ ∣ x ) = p ( y ∣ u , θ , x ) p ( u ∣ z , θ , x ) p ( z ∣ θ , x ) p ( θ ∣ x ) p(\boldsymbol{y}, \boldsymbol{u}, \boldsymbol{z}, \boldsymbol{\theta} \mid \boldsymbol{x}) =
p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{\theta} \mid \boldsymbol{x}) p ( y , u , z , θ ∣ x ) = p ( y ∣ u , θ , x ) p ( u ∣ z , θ , x ) p ( z ∣ θ , x ) p ( θ ∣ x ) Notation ¶ Table 1: Symbols used throughout this note.
Symbol Space Meaning y \boldsymbol{y} y R D y \mathbb{R}^{D_y} R D y observations (gappy, noisy) u \boldsymbol{u} u R D u \mathbb{R}^{D_u} R D u full state (e.g. SSH field) z \boldsymbol{z} z R D z \mathbb{R}^{D_z} R D z emulator latent variable (D z ≤ D u D_z \le D_u D z ≤ D u , often D z ≪ D u D_z \ll D_u D z ≪ D u ) θ \boldsymbol{\theta} θ R D θ \mathbb{R}^{D_\theta} R D θ all generative parameters (decoder, prior, noise) x \boldsymbol{x} x R D x \mathbb{R}^{D_x} R D x covariates / controls (forcing, season, geometry) ψ \boldsymbol{\psi} ψ R D ψ \mathbb{R}^{D_\psi} R D ψ all inference (variational) parameters
D ∙ D_\bullet D ∙ always denotes the dimensionality of an object — e.g.
y ∈ R D y \boldsymbol{y} \in \mathbb{R}^{D_y} y ∈ R D y , u ∈ R D u \boldsymbol{u} \in \mathbb{R}^{D_u} u ∈ R D u .
Capital N N N and M M M are reserved for sample counts , e.g. a dataset of N N N
observation–covariate pairs { y ( n ) , x ( n ) } n = 1 N \{\boldsymbol{y}^{(n)}, \boldsymbol{x}^{(n)}\}_{n=1}^{N} { y ( n ) , x ( n ) } n = 1 N
that amortized inference generalises over. So z ∈ R D z \boldsymbol{z} \in \mathbb{R}^{D_z} z ∈ R D z
with D z ≪ D u D_z \ll D_u D z ≪ D u is a dimension reduction, whereas N N N counts how many
observations we have.
The full probabilistic graphical model implied by (1) :
Track 1 — Simulator ¶ Generative Model ¶ With no internal latent z \boldsymbol{z} z , the simulator maps θ , x \boldsymbol{\theta},
\boldsymbol{x} θ , x directly to the state u \boldsymbol{u} u . The state is the only
latent object besides θ \boldsymbol{\theta} θ .
p ( y , u , θ ∣ x ) = p ( y ∣ u , θ , x ) p ( u ∣ θ , x ) p ( θ ∣ x ) p(\boldsymbol{y}, \boldsymbol{u}, \boldsymbol{\theta} \mid \boldsymbol{x}) =
p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{\theta} \mid \boldsymbol{x}) p ( y , u , θ ∣ x ) = p ( y ∣ u , θ , x ) p ( u ∣ θ , x ) p ( θ ∣ x ) The marginal likelihood integrates out both the state and the parameters,
p ( y ∣ x ) = ∬ p ( y ∣ u , θ , x ) p ( u ∣ θ , x ) p ( θ ∣ x ) d u d θ . p(\boldsymbol{y} \mid \boldsymbol{x}) =
\iint p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{\theta} \mid \boldsymbol{x}) \,
\mathrm{d}\boldsymbol{u}\, \mathrm{d}\boldsymbol{\theta}. p ( y ∣ x ) = ∬ p ( y ∣ u , θ , x ) p ( u ∣ θ , x ) p ( θ ∣ x ) d u d θ . 1A · Exact Inference ¶ The exact target posteriors are:
Params only : p ( θ ∣ y , x ) = ∫ p ( θ ∣ y , u , x ) p ( u ∣ y , x ) d u State only : p ( u ∣ y , x , θ ) = p ( y ∣ u , θ , x ) p ( u ∣ θ , x ) p ( y ∣ x , θ ) Joint : p ( u , θ ∣ y , x ) = p ( y ∣ u , θ , x ) p ( u ∣ θ , x ) p ( θ ∣ x ) p ( y ∣ x ) \begin{aligned}
\text{Params only}: && p(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x})
&= \int p(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{u}, \boldsymbol{x}) \,
p(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}) \, \mathrm{d}\boldsymbol{u} \\
\text{State only}: && p(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta})
&= \frac{p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x})}{p(\boldsymbol{y} \mid \boldsymbol{x}, \boldsymbol{\theta})} \\
\text{Joint}: && p(\boldsymbol{u}, \boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x})
&= \frac{p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{\theta} \mid \boldsymbol{x})}{p(\boldsymbol{y} \mid \boldsymbol{x})}
\end{aligned} Params only : State only : Joint : p ( θ ∣ y , x ) p ( u ∣ y , x , θ ) p ( u , θ ∣ y , x ) = ∫ p ( θ ∣ y , u , x ) p ( u ∣ y , x ) d u = p ( y ∣ x , θ ) p ( y ∣ u , θ , x ) p ( u ∣ θ , x ) = p ( y ∣ x ) p ( y ∣ u , θ , x ) p ( u ∣ θ , x ) p ( θ ∣ x ) The asymmetry is deliberate: the params-only posterior must marginalise out
the still-unknown state u \boldsymbol{u} u (hence the integral), whereas the
state-only posterior conditions on θ \boldsymbol{\theta} θ , leaving
u \boldsymbol{u} u as the only unknown. It is therefore a plain Bayes posterior
with no second latent to integrate — only the evidence
p ( y ∣ x , θ ) = ∫ p ( y ∣ u , θ , x ) p ( u ∣ θ , x ) d u p(\boldsymbol{y} \mid \boldsymbol{x}, \boldsymbol{\theta}) =
\int p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) \, \mathrm{d}\boldsymbol{u} p ( y ∣ x , θ ) = ∫ p ( y ∣ u , θ , x ) p ( u ∣ θ , x ) d u
appears in its normaliser.
1B · Variational Inference ¶ Introduce a variational distribution q q q with parameters ψ \boldsymbol{\psi} ψ
optimised per observation . Here ψ \boldsymbol{\psi} ψ does not generalise
across different y \boldsymbol{y} y or x \boldsymbol{x} x — a fresh
ψ \boldsymbol{\psi} ψ is optimised for each observation.
Params only — q ( θ ∣ ψ ) ≈ p ( θ ∣ y , x ) q(\boldsymbol{\theta} \mid \boldsymbol{\psi}) \approx p(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}) q ( θ ∣ ψ ) ≈ p ( θ ∣ y , x )
L ( ψ ) = E q ( θ ∣ ψ ) [ log p ( y ∣ θ , x ) ] − D K L [ q ( θ ∣ ψ ) ∥ p ( θ ∣ x ) ] \mathcal{L}(\boldsymbol{\psi}) =
\mathbb{E}_{q(\boldsymbol{\theta} \mid \boldsymbol{\psi})}
\!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{\theta}, \boldsymbol{x}) \right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{\theta} \mid \boldsymbol{\psi}) \,\|\,
p(\boldsymbol{\theta} \mid \boldsymbol{x}) \,\right] L ( ψ ) = E q ( θ ∣ ψ ) [ log p ( y ∣ θ , x ) ] − D KL [ q ( θ ∣ ψ ) ∥ p ( θ ∣ x ) ] The inner likelihood p ( y ∣ θ , x ) = ∫ p ( y ∣ u , θ , x ) p ( u ∣ θ , x ) d u p(\boldsymbol{y} \mid \boldsymbol{\theta}, \boldsymbol{x}) =
\int p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) \, \mathrm{d}\boldsymbol{u} p ( y ∣ θ , x ) = ∫ p ( y ∣ u , θ , x ) p ( u ∣ θ , x ) d u
in (5) is itself intractable and may require a further inner
approximation or a Monte-Carlo estimator.
State only — q ( u ∣ ψ ) ≈ p ( u ∣ y , x , θ ) q(\boldsymbol{u} \mid \boldsymbol{\psi}) \approx p(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta}) q ( u ∣ ψ ) ≈ p ( u ∣ y , x , θ )
L ( ψ ) = E q ( u ∣ ψ ) [ log p ( y ∣ u , θ , x ) ] − D K L [ q ( u ∣ ψ ) ∥ p ( u ∣ θ , x ) ] \mathcal{L}(\boldsymbol{\psi}) =
\mathbb{E}_{q(\boldsymbol{u} \mid \boldsymbol{\psi})}
\!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u} \mid \boldsymbol{\psi}) \,\|\,
p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right] L ( ψ ) = E q ( u ∣ ψ ) [ log p ( y ∣ u , θ , x ) ] − D KL [ q ( u ∣ ψ ) ∥ p ( u ∣ θ , x ) ] Joint — two ways to factor the variational family:
q ( u , θ ∣ ψ ) = q ( u ∣ ψ 1 ) q ( θ ∣ ψ 2 ) ≈ p ( u , θ ∣ y , x ) q(\boldsymbol{u}, \boldsymbol{\theta} \mid \boldsymbol{\psi}) =
q(\boldsymbol{u} \mid \boldsymbol{\psi}_1) \, q(\boldsymbol{\theta} \mid \boldsymbol{\psi}_2)
\approx p(\boldsymbol{u}, \boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}) q ( u , θ ∣ ψ ) = q ( u ∣ ψ 1 ) q ( θ ∣ ψ 2 ) ≈ p ( u , θ ∣ y , x ) L ( ψ ) = E q ( u ∣ ψ 1 ) q ( θ ∣ ψ 2 ) [ log p ( y ∣ u , θ , x ) ] − D K L [ q ( u ∣ ψ 1 ) ∥ p ( u ∣ θ , x ) ] − D K L [ q ( θ ∣ ψ 2 ) ∥ p ( θ ∣ x ) ] \mathcal{L}(\boldsymbol{\psi}) =
\mathbb{E}_{q(\boldsymbol{u} \mid \boldsymbol{\psi}_1)\, q(\boldsymbol{\theta} \mid \boldsymbol{\psi}_2)}
\!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u} \mid \boldsymbol{\psi}_1) \,\|\, p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{\theta} \mid \boldsymbol{\psi}_2) \,\|\, p(\boldsymbol{\theta} \mid \boldsymbol{x}) \,\right] L ( ψ ) = E q ( u ∣ ψ 1 ) q ( θ ∣ ψ 2 ) [ log p ( y ∣ u , θ , x ) ] − D KL [ q ( u ∣ ψ 1 ) ∥ p ( u ∣ θ , x ) ] − D KL [ q ( θ ∣ ψ 2 ) ∥ p ( θ ∣ x ) ] q ( u , θ ∣ ψ ) = q ( u ∣ θ , ψ 1 ) q ( θ ∣ ψ 2 ) ≈ p ( u , θ ∣ y , x ) q(\boldsymbol{u}, \boldsymbol{\theta} \mid \boldsymbol{\psi}) =
q(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{\psi}_1) \, q(\boldsymbol{\theta} \mid \boldsymbol{\psi}_2)
\approx p(\boldsymbol{u}, \boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}) q ( u , θ ∣ ψ ) = q ( u ∣ θ , ψ 1 ) q ( θ ∣ ψ 2 ) ≈ p ( u , θ ∣ y , x ) L ( ψ ) = E q ( θ ∣ ψ 2 ) [ E q ( u ∣ θ , ψ 1 ) [ log p ( y ∣ u , θ , x ) ] − D K L [ q ( u ∣ θ , ψ 1 ) ∥ p ( u ∣ θ , x ) ] ] − D K L [ q ( θ ∣ ψ 2 ) ∥ p ( θ ∣ x ) ] \mathcal{L}(\boldsymbol{\psi}) =
\mathbb{E}_{q(\boldsymbol{\theta} \mid \boldsymbol{\psi}_2)}
\!\left[
\mathbb{E}_{q(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{\psi}_1)}
\!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{\psi}_1) \,\|\, p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right]
\right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{\theta} \mid \boldsymbol{\psi}_2) \,\|\, p(\boldsymbol{\theta} \mid \boldsymbol{x}) \,\right] L ( ψ ) = E q ( θ ∣ ψ 2 ) [ E q ( u ∣ θ , ψ 1 ) [ log p ( y ∣ u , θ , x ) ] − D KL [ q ( u ∣ θ , ψ 1 ) ∥ p ( u ∣ θ , x ) ] ] − D KL [ q ( θ ∣ ψ 2 ) ∥ p ( θ ∣ x ) ] 1C · Amortized Inference ¶ The variational family now conditions on both y \boldsymbol{y} y and
x \boldsymbol{x} x . A single forward pass replaces per-observation optimisation —
train once, generalise over y \boldsymbol{y} y and x \boldsymbol{x} x .
Params only — q ( θ ∣ y , x , ψ ) ≈ p ( θ ∣ y , x ) q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) \approx p(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}) q ( θ ∣ y , x , ψ ) ≈ p ( θ ∣ y , x )
L ( ψ ) = E q ( θ ∣ y , x , ψ ) [ log p ( y ∣ θ , x ) ] − D K L [ q ( θ ∣ y , x , ψ ) ∥ p ( θ ∣ x ) ] \mathcal{L}(\boldsymbol{\psi}) =
\mathbb{E}_{q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi})}
\!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{\theta}, \boldsymbol{x}) \right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) \,\|\,
p(\boldsymbol{\theta} \mid \boldsymbol{x}) \,\right] L ( ψ ) = E q ( θ ∣ y , x , ψ ) [ log p ( y ∣ θ , x ) ] − D KL [ q ( θ ∣ y , x , ψ ) ∥ p ( θ ∣ x ) ] State only — q ( u ∣ y , x , ψ ) ≈ p ( u ∣ y , x , θ ) q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) \approx p(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta}) q ( u ∣ y , x , ψ ) ≈ p ( u ∣ y , x , θ )
L ( ψ ) = E q ( u ∣ y , x , ψ ) [ log p ( y ∣ u , θ , x ) ] − D K L [ q ( u ∣ y , x , ψ ) ∥ p ( u ∣ θ , x ) ] \mathcal{L}(\boldsymbol{\psi}) =
\mathbb{E}_{q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi})}
\!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) \,\|\,
p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right] L ( ψ ) = E q ( u ∣ y , x , ψ ) [ log p ( y ∣ u , θ , x ) ] − D KL [ q ( u ∣ y , x , ψ ) ∥ p ( u ∣ θ , x ) ] Joint
q ( u , θ ∣ y , x , ψ ) = q ( u ∣ y , x , ψ 1 ) q ( θ ∣ y , x , ψ 2 ) q(\boldsymbol{u}, \boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) =
q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_1) \,
q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_2) q ( u , θ ∣ y , x , ψ ) = q ( u ∣ y , x , ψ 1 ) q ( θ ∣ y , x , ψ 2 ) L ( ψ ) = E q ( u ∣ y , x , ψ 1 ) q ( θ ∣ y , x , ψ 2 ) [ log p ( y ∣ u , θ , x ) ] − D K L [ q ( u ∣ y , x , ψ 1 ) ∥ p ( u ∣ θ , x ) ] − D K L [ q ( θ ∣ y , x , ψ 2 ) ∥ p ( θ ∣ x ) ] \mathcal{L}(\boldsymbol{\psi}) =
\mathbb{E}_{q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_1)\,
q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_2)}
\!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_1) \,\|\, p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_2) \,\|\, p(\boldsymbol{\theta} \mid \boldsymbol{x}) \,\right] L ( ψ ) = E q ( u ∣ y , x , ψ 1 ) q ( θ ∣ y , x , ψ 2 ) [ log p ( y ∣ u , θ , x ) ] − D KL [ q ( u ∣ y , x , ψ 1 ) ∥ p ( u ∣ θ , x ) ] − D KL [ q ( θ ∣ y , x , ψ 2 ) ∥ p ( θ ∣ x ) ] q ( u , θ ∣ y , x , ψ ) = q ( u ∣ y , x , θ , ψ 1 ) q ( θ ∣ y , x , ψ 2 ) q(\boldsymbol{u}, \boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) =
q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta}, \boldsymbol{\psi}_1) \,
q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_2) q ( u , θ ∣ y , x , ψ ) = q ( u ∣ y , x , θ , ψ 1 ) q ( θ ∣ y , x , ψ 2 ) L ( ψ ) = E q ( θ ∣ y , x , ψ 2 ) [ E q ( u ∣ y , x , θ , ψ 1 ) [ log p ( y ∣ u , θ , x ) ] − D K L [ q ( u ∣ y , x , θ , ψ 1 ) ∥ p ( u ∣ θ , x ) ] ] − D K L [ q ( θ ∣ y , x , ψ 2 ) ∥ p ( θ ∣ x ) ] \mathcal{L}(\boldsymbol{\psi}) =
\mathbb{E}_{q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_2)}
\!\left[
\mathbb{E}_{q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta}, \boldsymbol{\psi}_1)}
\!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta}, \boldsymbol{\psi}_1) \,\|\, p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right]
\right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_2) \,\|\, p(\boldsymbol{\theta} \mid \boldsymbol{x}) \,\right] L ( ψ ) = E q ( θ ∣ y , x , ψ 2 ) [ E q ( u ∣ y , x , θ , ψ 1 ) [ log p ( y ∣ u , θ , x ) ] − D KL [ q ( u ∣ y , x , θ , ψ 1 ) ∥ p ( u ∣ θ , x ) ] ] − D KL [ q ( θ ∣ y , x , ψ 2 ) ∥ p ( θ ∣ x ) ] Track 2 — Emulator ¶ Generative Model ¶ The emulator introduces an internal latent variable z \boldsymbol{z} z that
compresses the full state u \boldsymbol{u} u . It is itself a generative model,
trained on simulator outputs before inference is performed.
p ( y , u , z , θ ∣ x ) = p ( y ∣ u , θ , x ) p ( u ∣ z , θ , x ) p ( z ∣ θ , x ) p ( θ ∣ x ) p(\boldsymbol{y}, \boldsymbol{u}, \boldsymbol{z}, \boldsymbol{\theta} \mid \boldsymbol{x}) =
p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{\theta} \mid \boldsymbol{x}) p ( y , u , z , θ ∣ x ) = p ( y ∣ u , θ , x ) p ( u ∣ z , θ , x ) p ( z ∣ θ , x ) p ( θ ∣ x ) The marginal likelihood now also integrates out z \boldsymbol{z} z ,
p ( y ∣ x ) = ∭ p ( y ∣ u , θ , x ) p ( u ∣ z , θ , x ) p ( z ∣ θ , x ) p ( θ ∣ x ) d u d z d θ . p(\boldsymbol{y} \mid \boldsymbol{x}) =
\iiint p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{\theta} \mid \boldsymbol{x}) \,
\mathrm{d}\boldsymbol{u}\, \mathrm{d}\boldsymbol{z}\, \mathrm{d}\boldsymbol{\theta}. p ( y ∣ x ) = ∭ p ( y ∣ u , θ , x ) p ( u ∣ z , θ , x ) p ( z ∣ θ , x ) p ( θ ∣ x ) d u d z d θ . 2.0 · Emulator Training ¶ Before inference, train the emulator on simulator outputs { u , x , θ } \{\boldsymbol{u},
\boldsymbol{x}, \boldsymbol{\theta}\} { u , x , θ } using its own internal ELBO. This
introduces emulator inference parameters ψ e m \boldsymbol{\psi}_{\mathrm{em}} ψ em and
an encoder q ( z ∣ u , x , θ , ψ e m ) q(\boldsymbol{z} \mid \boldsymbol{u}, \boldsymbol{x},
\boldsymbol{\theta}, \boldsymbol{\psi}_{\mathrm{em}}) q ( z ∣ u , x , θ , ψ em ) .
L e m ( θ , ψ e m ) = E q ( z ∣ u , x , θ , ψ e m ) [ log p ( u ∣ z , θ , x ) ] − D K L [ q ( z ∣ u , x , θ , ψ e m ) ∥ p ( z ∣ θ , x ) ] \mathcal{L}_{\mathrm{em}}(\boldsymbol{\theta}, \boldsymbol{\psi}_{\mathrm{em}}) =
\mathbb{E}_{q(\boldsymbol{z} \mid \boldsymbol{u}, \boldsymbol{x}, \boldsymbol{\theta}, \boldsymbol{\psi}_{\mathrm{em}})}
\!\left[ \log p(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{z} \mid \boldsymbol{u}, \boldsymbol{x}, \boldsymbol{\theta}, \boldsymbol{\psi}_{\mathrm{em}}) \,\|\,
p(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right] L em ( θ , ψ em ) = E q ( z ∣ u , x , θ , ψ em ) [ log p ( u ∣ z , θ , x ) ] − D KL [ q ( z ∣ u , x , θ , ψ em ) ∥ p ( z ∣ θ , x ) ] 2.1 · Emulator Uncertainty Characterization ¶ Characterise the discrepancy between the emulator and the true simulator before
using the emulator for inference,
p ( u t r u e ∣ u e m , x , θ ) . p(\boldsymbol{u}_{\mathrm{true}} \mid \boldsymbol{u}_{\mathrm{em}}, \boldsymbol{x}, \boldsymbol{\theta}). p ( u true ∣ u em , x , θ ) . 2A · Exact Inference ¶ The targets now involve three unknowns: u , z , θ \boldsymbol{u}, \boldsymbol{z},
\boldsymbol{\theta} u , z , θ .
Params only : p ( θ ∣ y , x ) = ∬ p ( θ ∣ y , u , z , x ) p ( u , z ∣ y , x ) d u d z State only : p ( u ∣ y , x , θ ) = ∫ p ( u ∣ y , z , x , θ ) p ( z ∣ y , x , θ ) d z Latent only : p ( z ∣ y , x , θ ) = ∫ p ( z ∣ y , u , x , θ ) p ( u ∣ y , x , θ ) d u Joint : p ( u , z , θ ∣ y , x ) = p ( y ∣ u , θ , x ) p ( u ∣ z , θ , x ) p ( z ∣ θ , x ) p ( θ ∣ x ) p ( y ∣ x ) \begin{aligned}
\text{Params only}: && p(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x})
&= \iint p(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{u}, \boldsymbol{z}, \boldsymbol{x}) \,
p(\boldsymbol{u}, \boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{x}) \, \mathrm{d}\boldsymbol{u}\, \mathrm{d}\boldsymbol{z} \\
\text{State only}: && p(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta})
&= \int p(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{z}, \boldsymbol{x}, \boldsymbol{\theta}) \,
p(\boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta}) \, \mathrm{d}\boldsymbol{z} \\
\text{Latent only}: && p(\boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta})
&= \int p(\boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{u}, \boldsymbol{x}, \boldsymbol{\theta}) \,
p(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta}) \, \mathrm{d}\boldsymbol{u} \\
\text{Joint}: && p(\boldsymbol{u}, \boldsymbol{z}, \boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x})
&= \frac{p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{\theta} \mid \boldsymbol{x})}{p(\boldsymbol{y} \mid \boldsymbol{x})}
\end{aligned} Params only : State only : Latent only : Joint : p ( θ ∣ y , x ) p ( u ∣ y , x , θ ) p ( z ∣ y , x , θ ) p ( u , z , θ ∣ y , x ) = ∬ p ( θ ∣ y , u , z , x ) p ( u , z ∣ y , x ) d u d z = ∫ p ( u ∣ y , z , x , θ ) p ( z ∣ y , x , θ ) d z = ∫ p ( z ∣ y , u , x , θ ) p ( u ∣ y , x , θ ) d u = p ( y ∣ x ) p ( y ∣ u , θ , x ) p ( u ∣ z , θ , x ) p ( z ∣ θ , x ) p ( θ ∣ x ) Here every conditional target sits in the chain z → u → y \boldsymbol{z} \to \boldsymbol{u}
\to \boldsymbol{y} z → u → y and marginalises out the other latent: conditioning on
θ \boldsymbol{\theta} θ , the state-only posterior integrates out the deeper
latent z \boldsymbol{z} z ; the latent-only posterior — the emulator’s
compressed state given observations — integrates out the intervening state
u \boldsymbol{u} u ; and the params-only posterior integrates out both. Only the
joint, which targets all unknowns at once, is a single Bayes ratio with no
marginalisation.
2B · Variational Inference ¶ Per-observation q q q with parameters ψ \boldsymbol{\psi} ψ . The emulator’s internal
latent z \boldsymbol{z} z is now an additional object the variational family must
handle.
Params only — q ( θ ∣ ψ ) ≈ p ( θ ∣ y , x ) q(\boldsymbol{\theta} \mid \boldsymbol{\psi}) \approx p(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}) q ( θ ∣ ψ ) ≈ p ( θ ∣ y , x )
L ( ψ ) = E q ( θ ∣ ψ ) [ log p ( y ∣ θ , x ) ] − D K L [ q ( θ ∣ ψ ) ∥ p ( θ ∣ x ) ] \mathcal{L}(\boldsymbol{\psi}) =
\mathbb{E}_{q(\boldsymbol{\theta} \mid \boldsymbol{\psi})}
\!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{\theta}, \boldsymbol{x}) \right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{\theta} \mid \boldsymbol{\psi}) \,\|\,
p(\boldsymbol{\theta} \mid \boldsymbol{x}) \,\right] L ( ψ ) = E q ( θ ∣ ψ ) [ log p ( y ∣ θ , x ) ] − D KL [ q ( θ ∣ ψ ) ∥ p ( θ ∣ x ) ] with p ( y ∣ θ , x ) = ∬ p ( y ∣ u , θ , x ) p ( u ∣ z , θ , x ) p ( z ∣ θ , x ) d u d z p(\boldsymbol{y} \mid \boldsymbol{\theta}, \boldsymbol{x}) =
\iint p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{x}) \, \mathrm{d}\boldsymbol{u}\, \mathrm{d}\boldsymbol{z} p ( y ∣ θ , x ) = ∬ p ( y ∣ u , θ , x ) p ( u ∣ z , θ , x ) p ( z ∣ θ , x ) d u d z
requiring a further inner approximation over both u \boldsymbol{u} u and
z \boldsymbol{z} z .
State only — q ( u ∣ ψ ) ≈ p ( u ∣ y , x , θ ) q(\boldsymbol{u} \mid \boldsymbol{\psi}) \approx p(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta}) q ( u ∣ ψ ) ≈ p ( u ∣ y , x , θ )
L ( ψ ) = E q ( u ∣ ψ ) [ log p ( y ∣ u , θ , x ) ] − D K L [ q ( u ∣ ψ ) ∥ p ( u ∣ θ , x ) ] \mathcal{L}(\boldsymbol{\psi}) =
\mathbb{E}_{q(\boldsymbol{u} \mid \boldsymbol{\psi})}
\!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u} \mid \boldsymbol{\psi}) \,\|\,
p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right] L ( ψ ) = E q ( u ∣ ψ ) [ log p ( y ∣ u , θ , x ) ] − D KL [ q ( u ∣ ψ ) ∥ p ( u ∣ θ , x ) ] where p ( u ∣ θ , x ) = ∫ p ( u ∣ z , θ , x ) p ( z ∣ θ , x ) d z p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) =
\int p(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{x}) \, \mathrm{d}\boldsymbol{z} p ( u ∣ θ , x ) = ∫ p ( u ∣ z , θ , x ) p ( z ∣ θ , x ) d z
is itself intractable and approximated via the emulator ELBO (19) .
Latent only — q ( z ∣ ψ ) ≈ p ( z ∣ y , x , θ ) q(\boldsymbol{z} \mid \boldsymbol{\psi}) \approx p(\boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta}) q ( z ∣ ψ ) ≈ p ( z ∣ y , x , θ )
L ( ψ ) = E q ( z ∣ ψ ) [ log p ( y ∣ z , θ , x ) ] − D K L [ q ( z ∣ ψ ) ∥ p ( z ∣ θ , x ) ] \mathcal{L}(\boldsymbol{\psi}) =
\mathbb{E}_{q(\boldsymbol{z} \mid \boldsymbol{\psi})}
\!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{z} \mid \boldsymbol{\psi}) \,\|\,
p(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right] L ( ψ ) = E q ( z ∣ ψ ) [ log p ( y ∣ z , θ , x ) ] − D KL [ q ( z ∣ ψ ) ∥ p ( z ∣ θ , x ) ] with p ( y ∣ z , θ , x ) = ∫ p ( y ∣ u , θ , x ) p ( u ∣ z , θ , x ) d u p(\boldsymbol{y} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) =
\int p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \,
p(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \, \mathrm{d}\boldsymbol{u} p ( y ∣ z , θ , x ) = ∫ p ( y ∣ u , θ , x ) p ( u ∣ z , θ , x ) d u .
Joint
q ( u , z , θ ∣ ψ ) = q ( u ∣ ψ 1 ) q ( z ∣ ψ 2 ) q ( θ ∣ ψ 3 ) q(\boldsymbol{u}, \boldsymbol{z}, \boldsymbol{\theta} \mid \boldsymbol{\psi}) =
q(\boldsymbol{u} \mid \boldsymbol{\psi}_1) \,
q(\boldsymbol{z} \mid \boldsymbol{\psi}_2) \,
q(\boldsymbol{\theta} \mid \boldsymbol{\psi}_3) q ( u , z , θ ∣ ψ ) = q ( u ∣ ψ 1 ) q ( z ∣ ψ 2 ) q ( θ ∣ ψ 3 ) L ( ψ ) = E q ( u , z , θ ∣ ψ ) [ log p ( y ∣ u , θ , x ) ] − D K L [ q ( u ∣ ψ 1 ) ∥ p ( u ∣ z , θ , x ) ] − D K L [ q ( z ∣ ψ 2 ) ∥ p ( z ∣ θ , x ) ] − D K L [ q ( θ ∣ ψ 3 ) ∥ p ( θ ∣ x ) ] \mathcal{L}(\boldsymbol{\psi}) =
\mathbb{E}_{q(\boldsymbol{u}, \boldsymbol{z}, \boldsymbol{\theta} \mid \boldsymbol{\psi})}
\!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u} \mid \boldsymbol{\psi}_1) \,\|\, p(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \,\right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{z} \mid \boldsymbol{\psi}_2) \,\|\, p(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{\theta} \mid \boldsymbol{\psi}_3) \,\|\, p(\boldsymbol{\theta} \mid \boldsymbol{x}) \,\right] L ( ψ ) = E q ( u , z , θ ∣ ψ ) [ log p ( y ∣ u , θ , x ) ] − D KL [ q ( u ∣ ψ 1 ) ∥ p ( u ∣ z , θ , x ) ] − D KL [ q ( z ∣ ψ 2 ) ∥ p ( z ∣ θ , x ) ] − D KL [ q ( θ ∣ ψ 3 ) ∥ p ( θ ∣ x ) ] q ( u , z , θ ∣ ψ ) = q ( u ∣ z , θ , ψ 1 ) q ( z ∣ θ , ψ 2 ) q ( θ ∣ ψ 3 ) q(\boldsymbol{u}, \boldsymbol{z}, \boldsymbol{\theta} \mid \boldsymbol{\psi}) =
q(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{\psi}_1) \,
q(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{\psi}_2) \,
q(\boldsymbol{\theta} \mid \boldsymbol{\psi}_3) q ( u , z , θ ∣ ψ ) = q ( u ∣ z , θ , ψ 1 ) q ( z ∣ θ , ψ 2 ) q ( θ ∣ ψ 3 ) L ( ψ ) = E q ( θ ∣ ψ 3 ) [ E q ( z ∣ θ , ψ 2 ) [ E q ( u ∣ z , θ , ψ 1 ) [ log p ( y ∣ u , θ , x ) ] − D K L [ q ( u ∣ z , θ , ψ 1 ) ∥ p ( u ∣ z , θ , x ) ] ] − D K L [ q ( z ∣ θ , ψ 2 ) ∥ p ( z ∣ θ , x ) ] ] − D K L [ q ( θ ∣ ψ 3 ) ∥ p ( θ ∣ x ) ] \mathcal{L}(\boldsymbol{\psi}) =
\mathbb{E}_{q(\boldsymbol{\theta} \mid \boldsymbol{\psi}_3)}
\!\left[
\mathbb{E}_{q(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{\psi}_2)}
\!\left[
\mathbb{E}_{q(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{\psi}_1)}
\!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{\psi}_1) \,\|\, p(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \,\right]
\right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{\psi}_2) \,\|\, p(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right]
\right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{\theta} \mid \boldsymbol{\psi}_3) \,\|\, p(\boldsymbol{\theta} \mid \boldsymbol{x}) \,\right] L ( ψ ) = E q ( θ ∣ ψ 3 ) [ E q ( z ∣ θ , ψ 2 ) [ E q ( u ∣ z , θ , ψ 1 ) [ log p ( y ∣ u , θ , x ) ] − D KL [ q ( u ∣ z , θ , ψ 1 ) ∥ p ( u ∣ z , θ , x ) ] ] − D KL [ q ( z ∣ θ , ψ 2 ) ∥ p ( z ∣ θ , x ) ] ] − D KL [ q ( θ ∣ ψ 3 ) ∥ p ( θ ∣ x ) ] 2C · Amortized Inference ¶ The variational family conditions on both y \boldsymbol{y} y and x \boldsymbol{x} x ;
a single forward pass replaces per-observation optimisation.
Params only — q ( θ ∣ y , x , ψ ) ≈ p ( θ ∣ y , x ) q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) \approx p(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}) q ( θ ∣ y , x , ψ ) ≈ p ( θ ∣ y , x )
L ( ψ ) = E q ( θ ∣ y , x , ψ ) [ log p ( y ∣ θ , x ) ] − D K L [ q ( θ ∣ y , x , ψ ) ∥ p ( θ ∣ x ) ] \mathcal{L}(\boldsymbol{\psi}) =
\mathbb{E}_{q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi})}
\!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{\theta}, \boldsymbol{x}) \right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) \,\|\,
p(\boldsymbol{\theta} \mid \boldsymbol{x}) \,\right] L ( ψ ) = E q ( θ ∣ y , x , ψ ) [ log p ( y ∣ θ , x ) ] − D KL [ q ( θ ∣ y , x , ψ ) ∥ p ( θ ∣ x ) ] State only — q ( u ∣ y , x , ψ ) ≈ p ( u ∣ y , x , θ ) q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) \approx p(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta}) q ( u ∣ y , x , ψ ) ≈ p ( u ∣ y , x , θ )
L ( ψ ) = E q ( u ∣ y , x , ψ ) [ log p ( y ∣ u , θ , x ) ] − D K L [ q ( u ∣ y , x , ψ ) ∥ p ( u ∣ θ , x ) ] \mathcal{L}(\boldsymbol{\psi}) =
\mathbb{E}_{q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi})}
\!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) \,\|\,
p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right] L ( ψ ) = E q ( u ∣ y , x , ψ ) [ log p ( y ∣ u , θ , x ) ] − D KL [ q ( u ∣ y , x , ψ ) ∥ p ( u ∣ θ , x ) ] Latent only — q ( z ∣ y , x , ψ ) ≈ p ( z ∣ y , x , θ ) q(\boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) \approx p(\boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta}) q ( z ∣ y , x , ψ ) ≈ p ( z ∣ y , x , θ )
L ( ψ ) = E q ( z ∣ y , x , ψ ) [ log p ( y ∣ z , θ , x ) ] − D K L [ q ( z ∣ y , x , ψ ) ∥ p ( z ∣ θ , x ) ] \mathcal{L}(\boldsymbol{\psi}) =
\mathbb{E}_{q(\boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi})}
\!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) \,\|\,
p(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right] L ( ψ ) = E q ( z ∣ y , x , ψ ) [ log p ( y ∣ z , θ , x ) ] − D KL [ q ( z ∣ y , x , ψ ) ∥ p ( z ∣ θ , x ) ] Joint
q ( u , z , θ ∣ y , x , ψ ) = q ( u ∣ y , x , ψ 1 ) q ( z ∣ y , x , ψ 2 ) q ( θ ∣ y , x , ψ 3 ) q(\boldsymbol{u}, \boldsymbol{z}, \boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) =
q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_1) \,
q(\boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_2) \,
q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_3) q ( u , z , θ ∣ y , x , ψ ) = q ( u ∣ y , x , ψ 1 ) q ( z ∣ y , x , ψ 2 ) q ( θ ∣ y , x , ψ 3 ) L ( ψ ) = E q ( u , z , θ ∣ y , x , ψ ) [ log p ( y ∣ u , θ , x ) ] − D K L [ q ( u ∣ y , x , ψ 1 ) ∥ p ( u ∣ z , θ , x ) ] − D K L [ q ( z ∣ y , x , ψ 2 ) ∥ p ( z ∣ θ , x ) ] − D K L [ q ( θ ∣ y , x , ψ 3 ) ∥ p ( θ ∣ x ) ] \mathcal{L}(\boldsymbol{\psi}) =
\mathbb{E}_{q(\boldsymbol{u}, \boldsymbol{z}, \boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi})}
\!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_1) \,\|\, p(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \,\right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_2) \,\|\, p(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_3) \,\|\, p(\boldsymbol{\theta} \mid \boldsymbol{x}) \,\right] L ( ψ ) = E q ( u , z , θ ∣ y , x , ψ ) [ log p ( y ∣ u , θ , x ) ] − D KL [ q ( u ∣ y , x , ψ 1 ) ∥ p ( u ∣ z , θ , x ) ] − D KL [ q ( z ∣ y , x , ψ 2 ) ∥ p ( z ∣ θ , x ) ] − D KL [ q ( θ ∣ y , x , ψ 3 ) ∥ p ( θ ∣ x ) ] q ( u , z , θ ∣ y , x , ψ ) = q ( u ∣ y , z , x , θ , ψ 1 ) q ( z ∣ y , x , θ , ψ 2 ) q ( θ ∣ y , x , ψ 3 ) q(\boldsymbol{u}, \boldsymbol{z}, \boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) =
q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{z}, \boldsymbol{x}, \boldsymbol{\theta}, \boldsymbol{\psi}_1) \,
q(\boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta}, \boldsymbol{\psi}_2) \,
q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_3) q ( u , z , θ ∣ y , x , ψ ) = q ( u ∣ y , z , x , θ , ψ 1 ) q ( z ∣ y , x , θ , ψ 2 ) q ( θ ∣ y , x , ψ 3 ) L ( ψ ) = E q ( θ ∣ y , x , ψ 3 ) [ E q ( z ∣ y , x , θ , ψ 2 ) [ E q ( u ∣ y , z , x , θ , ψ 1 ) [ log p ( y ∣ u , θ , x ) ] − D K L [ q ( u ∣ y , z , x , θ , ψ 1 ) ∥ p ( u ∣ z , θ , x ) ] ] − D K L [ q ( z ∣ y , x , θ , ψ 2 ) ∥ p ( z ∣ θ , x ) ] ] − D K L [ q ( θ ∣ y , x , ψ 3 ) ∥ p ( θ ∣ x ) ] \mathcal{L}(\boldsymbol{\psi}) =
\mathbb{E}_{q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_3)}
\!\left[
\mathbb{E}_{q(\boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta}, \boldsymbol{\psi}_2)}
\!\left[
\mathbb{E}_{q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{z}, \boldsymbol{x}, \boldsymbol{\theta}, \boldsymbol{\psi}_1)}
\!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{z}, \boldsymbol{x}, \boldsymbol{\theta}, \boldsymbol{\psi}_1) \,\|\, p(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \,\right]
\right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta}, \boldsymbol{\psi}_2) \,\|\, p(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right]
\right]
- D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_3) \,\|\, p(\boldsymbol{\theta} \mid \boldsymbol{x}) \,\right] L ( ψ ) = E q ( θ ∣ y , x , ψ 3 ) [ E q ( z ∣ y , x , θ , ψ 2 ) [ E q ( u ∣ y , z , x , θ , ψ 1 ) [ log p ( y ∣ u , θ , x ) ] − D KL [ q ( u ∣ y , z , x , θ , ψ 1 ) ∥ p ( u ∣ z , θ , x ) ] ] − D KL [ q ( z ∣ y , x , θ , ψ 2 ) ∥ p ( z ∣ θ , x ) ] ] − D KL [ q ( θ ∣ y , x , ψ 3 ) ∥ p ( θ ∣ x ) ] Summary Table ¶ Table 2: Every (track, regime, target) combination with its variational family and ELBO terms.
Track Regime Target Variational family ELBO terms Simulator Exact θ \boldsymbol{\theta} θ p ( θ ∣ y , x ) p(\boldsymbol{\theta}\mid\boldsymbol{y},\boldsymbol{x}) p ( θ ∣ y , x ) intractable Simulator Exact u \boldsymbol{u} u p ( u ∣ y , x , θ ) p(\boldsymbol{u}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\theta}) p ( u ∣ y , x , θ ) intractable Simulator Exact u , θ \boldsymbol{u},\boldsymbol{\theta} u , θ p ( u , θ ∣ y , x ) p(\boldsymbol{u},\boldsymbol{\theta}\mid\boldsymbol{y},\boldsymbol{x}) p ( u , θ ∣ y , x ) intractable Simulator VI θ \boldsymbol{\theta} θ q ( θ ∣ ψ ) q(\boldsymbol{\theta}\mid\boldsymbol{\psi}) q ( θ ∣ ψ ) recon + K L θ +\;\mathrm{KL}_\theta + KL θ Simulator VI u \boldsymbol{u} u q ( u ∣ ψ ) q(\boldsymbol{u}\mid\boldsymbol{\psi}) q ( u ∣ ψ ) recon + K L u +\;\mathrm{KL}_u + KL u Simulator VI u , θ \boldsymbol{u},\boldsymbol{\theta} u , θ factoredq ( u ∣ ψ 1 ) q ( θ ∣ ψ 2 ) q(\boldsymbol{u}\mid\boldsymbol{\psi}_1)\,q(\boldsymbol{\theta}\mid\boldsymbol{\psi}_2) q ( u ∣ ψ 1 ) q ( θ ∣ ψ 2 ) recon + K L u + K L θ +\;\mathrm{KL}_u+\mathrm{KL}_\theta + KL u + KL θ Simulator VI u , θ \boldsymbol{u},\boldsymbol{\theta} u , θ hier.q ( u ∣ θ , ψ 1 ) q ( θ ∣ ψ 2 ) q(\boldsymbol{u}\mid\boldsymbol{\theta},\boldsymbol{\psi}_1)\,q(\boldsymbol{\theta}\mid\boldsymbol{\psi}_2) q ( u ∣ θ , ψ 1 ) q ( θ ∣ ψ 2 ) recon + K L u ∣ θ + K L θ +\;\mathrm{KL}_{u\mid\theta}+\mathrm{KL}_\theta + KL u ∣ θ + KL θ Simulator Amortized θ \boldsymbol{\theta} θ q ( θ ∣ y , x , ψ ) q(\boldsymbol{\theta}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\psi}) q ( θ ∣ y , x , ψ ) recon + K L θ +\;\mathrm{KL}_\theta + KL θ Simulator Amortized u \boldsymbol{u} u q ( u ∣ y , x , ψ ) q(\boldsymbol{u}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\psi}) q ( u ∣ y , x , ψ ) recon + K L u +\;\mathrm{KL}_u + KL u Simulator Amortized u , θ \boldsymbol{u},\boldsymbol{\theta} u , θ factoredq ( u ∣ y , x , ψ 1 ) q ( θ ∣ y , x , ψ 2 ) q(\boldsymbol{u}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\psi}_1)\,q(\boldsymbol{\theta}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\psi}_2) q ( u ∣ y , x , ψ 1 ) q ( θ ∣ y , x , ψ 2 ) recon + K L u + K L θ +\;\mathrm{KL}_u+\mathrm{KL}_\theta + KL u + KL θ Simulator Amortized u , θ \boldsymbol{u},\boldsymbol{\theta} u , θ hier.q ( u ∣ y , x , θ , ψ 1 ) q ( θ ∣ y , x , ψ 2 ) q(\boldsymbol{u}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\theta},\boldsymbol{\psi}_1)\,q(\boldsymbol{\theta}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\psi}_2) q ( u ∣ y , x , θ , ψ 1 ) q ( θ ∣ y , x , ψ 2 ) recon + K L u ∣ θ + K L θ +\;\mathrm{KL}_{u\mid\theta}+\mathrm{KL}_\theta + KL u ∣ θ + KL θ Emulator Training z \boldsymbol{z} z q ( z ∣ u , x , θ , ψ e m ) q(\boldsymbol{z}\mid\boldsymbol{u},\boldsymbol{x},\boldsymbol{\theta},\boldsymbol{\psi}_{\mathrm{em}}) q ( z ∣ u , x , θ , ψ em ) recon + K L z +\;\mathrm{KL}_z + KL z Emulator Exact θ \boldsymbol{\theta} θ p ( θ ∣ y , x ) p(\boldsymbol{\theta}\mid\boldsymbol{y},\boldsymbol{x}) p ( θ ∣ y , x ) intractable Emulator Exact u \boldsymbol{u} u p ( u ∣ y , x , θ ) p(\boldsymbol{u}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\theta}) p ( u ∣ y , x , θ ) intractable Emulator Exact z \boldsymbol{z} z p ( z ∣ y , x , θ ) p(\boldsymbol{z}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\theta}) p ( z ∣ y , x , θ ) intractable Emulator Exact u , z , θ \boldsymbol{u},\boldsymbol{z},\boldsymbol{\theta} u , z , θ p ( u , z , θ ∣ y , x ) p(\boldsymbol{u},\boldsymbol{z},\boldsymbol{\theta}\mid\boldsymbol{y},\boldsymbol{x}) p ( u , z , θ ∣ y , x ) intractable Emulator VI θ \boldsymbol{\theta} θ q ( θ ∣ ψ ) q(\boldsymbol{\theta}\mid\boldsymbol{\psi}) q ( θ ∣ ψ ) recon + K L θ +\;\mathrm{KL}_\theta + KL θ Emulator VI u \boldsymbol{u} u q ( u ∣ ψ ) q(\boldsymbol{u}\mid\boldsymbol{\psi}) q ( u ∣ ψ ) recon + K L u +\;\mathrm{KL}_u + KL u Emulator VI z \boldsymbol{z} z q ( z ∣ ψ ) q(\boldsymbol{z}\mid\boldsymbol{\psi}) q ( z ∣ ψ ) recon + K L z +\;\mathrm{KL}_z + KL z Emulator VI u , z , θ \boldsymbol{u},\boldsymbol{z},\boldsymbol{\theta} u , z , θ factoredq ( u ∣ ψ 1 ) q ( z ∣ ψ 2 ) q ( θ ∣ ψ 3 ) q(\boldsymbol{u}\mid\boldsymbol{\psi}_1)\,q(\boldsymbol{z}\mid\boldsymbol{\psi}_2)\,q(\boldsymbol{\theta}\mid\boldsymbol{\psi}_3) q ( u ∣ ψ 1 ) q ( z ∣ ψ 2 ) q ( θ ∣ ψ 3 ) recon + K L u + K L z + K L θ +\;\mathrm{KL}_u+\mathrm{KL}_z+\mathrm{KL}_\theta + KL u + KL z + KL θ Emulator VI u , z , θ \boldsymbol{u},\boldsymbol{z},\boldsymbol{\theta} u , z , θ hier.q ( u ∣ z , θ , ψ 1 ) q ( z ∣ θ , ψ 2 ) q ( θ ∣ ψ 3 ) q(\boldsymbol{u}\mid\boldsymbol{z},\boldsymbol{\theta},\boldsymbol{\psi}_1)\,q(\boldsymbol{z}\mid\boldsymbol{\theta},\boldsymbol{\psi}_2)\,q(\boldsymbol{\theta}\mid\boldsymbol{\psi}_3) q ( u ∣ z , θ , ψ 1 ) q ( z ∣ θ , ψ 2 ) q ( θ ∣ ψ 3 ) recon + K L u ∣ z , θ + K L z ∣ θ + K L θ +\;\mathrm{KL}_{u\mid z,\theta}+\mathrm{KL}_{z\mid\theta}+\mathrm{KL}_\theta + KL u ∣ z , θ + KL z ∣ θ + KL θ Emulator Amortized θ \boldsymbol{\theta} θ q ( θ ∣ y , x , ψ ) q(\boldsymbol{\theta}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\psi}) q ( θ ∣ y , x , ψ ) recon + K L θ +\;\mathrm{KL}_\theta + KL θ Emulator Amortized u \boldsymbol{u} u q ( u ∣ y , x , ψ ) q(\boldsymbol{u}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\psi}) q ( u ∣ y , x , ψ ) recon + K L u +\;\mathrm{KL}_u + KL u Emulator Amortized z \boldsymbol{z} z q ( z ∣ y , x , ψ ) q(\boldsymbol{z}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\psi}) q ( z ∣ y , x , ψ ) recon + K L z +\;\mathrm{KL}_z + KL z Emulator Amortized u , z , θ \boldsymbol{u},\boldsymbol{z},\boldsymbol{\theta} u , z , θ factoredq ( u ∣ y , x , ψ 1 ) q ( z ∣ y , x , ψ 2 ) q ( θ ∣ y , x , ψ 3 ) q(\boldsymbol{u}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\psi}_1)\,q(\boldsymbol{z}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\psi}_2)\,q(\boldsymbol{\theta}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\psi}_3) q ( u ∣ y , x , ψ 1 ) q ( z ∣ y , x , ψ 2 ) q ( θ ∣ y , x , ψ 3 ) recon + K L u + K L z + K L θ +\;\mathrm{KL}_u+\mathrm{KL}_z+\mathrm{KL}_\theta + KL u + KL z + KL θ Emulator Amortized u , z , θ \boldsymbol{u},\boldsymbol{z},\boldsymbol{\theta} u , z , θ hier.q ( u ∣ y , z , x , θ , ψ 1 ) q ( z ∣ y , x , θ , ψ 2 ) q ( θ ∣ y , x , ψ 3 ) q(\boldsymbol{u}\mid\boldsymbol{y},\boldsymbol{z},\boldsymbol{x},\boldsymbol{\theta},\boldsymbol{\psi}_1)\,q(\boldsymbol{z}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\theta},\boldsymbol{\psi}_2)\,q(\boldsymbol{\theta}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\psi}_3) q ( u ∣ y , z , x , θ , ψ 1 ) q ( z ∣ y , x , θ , ψ 2 ) q ( θ ∣ y , x , ψ 3 ) recon + K L u ∣ z , θ + K L z ∣ θ + K L θ +\;\mathrm{KL}_{u\mid z,\theta}+\mathrm{KL}_{z\mid\theta}+\mathrm{KL}_\theta + KL u ∣ z , θ + KL z ∣ θ + KL θ