Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Full Probabilistic Inference Schema

CSIC
UCM
IGEO

This note lays out a unified probabilistic schema for state and parameter estimation, organised along two axes:

  1. Two model tracks — a simulator (no internal latent variable) and an emulator (a latent-variable model trained on simulator outputs).

  2. Three inference regimesexact posteriors, per-observation variational inference, and amortized inference that generalises over observations.

The most general generative model factorises as

p(y,u,z,θx)=p(yu,θ,x)p(uz,θ,x)p(zθ,x)p(θx)p(\boldsymbol{y}, \boldsymbol{u}, \boldsymbol{z}, \boldsymbol{\theta} \mid \boldsymbol{x}) = p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \, p(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \, p(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{x}) \, p(\boldsymbol{\theta} \mid \boldsymbol{x})

Notation

Table 1:Symbols used throughout this note.

SymbolSpaceMeaning
y\boldsymbol{y}RDy\mathbb{R}^{D_y}observations (gappy, noisy)
u\boldsymbol{u}RDu\mathbb{R}^{D_u}full state (e.g. SSH field)
z\boldsymbol{z}RDz\mathbb{R}^{D_z}emulator latent variable (DzDuD_z \le D_u, often DzDuD_z \ll D_u)
θ\boldsymbol{\theta}RDθ\mathbb{R}^{D_\theta}all generative parameters (decoder, prior, noise)
x\boldsymbol{x}RDx\mathbb{R}^{D_x}covariates / controls (forcing, season, geometry)
ψ\boldsymbol{\psi}RDψ\mathbb{R}^{D_\psi}all inference (variational) parameters

The full probabilistic graphical model implied by (1):


Track 1 — Simulator

Generative Model

With no internal latent z\boldsymbol{z}, the simulator maps θ,x\boldsymbol{\theta}, \boldsymbol{x} directly to the state u\boldsymbol{u}. The state is the only latent object besides θ\boldsymbol{\theta}.

p(y,u,θx)=p(yu,θ,x)p(uθ,x)p(θx)p(\boldsymbol{y}, \boldsymbol{u}, \boldsymbol{\theta} \mid \boldsymbol{x}) = p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \, p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) \, p(\boldsymbol{\theta} \mid \boldsymbol{x})

The marginal likelihood integrates out both the state and the parameters,

p(yx)=p(yu,θ,x)p(uθ,x)p(θx)dudθ.p(\boldsymbol{y} \mid \boldsymbol{x}) = \iint p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \, p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) \, p(\boldsymbol{\theta} \mid \boldsymbol{x}) \, \mathrm{d}\boldsymbol{u}\, \mathrm{d}\boldsymbol{\theta}.

1A · Exact Inference

The exact target posteriors are:

Params only:p(θy,x)=p(θy,u,x)p(uy,x)duState only:p(uy,x,θ)=p(yu,θ,x)p(uθ,x)p(yx,θ)Joint:p(u,θy,x)=p(yu,θ,x)p(uθ,x)p(θx)p(yx)\begin{aligned} \text{Params only}: && p(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}) &= \int p(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{u}, \boldsymbol{x}) \, p(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}) \, \mathrm{d}\boldsymbol{u} \\ \text{State only}: && p(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta}) &= \frac{p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \, p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x})}{p(\boldsymbol{y} \mid \boldsymbol{x}, \boldsymbol{\theta})} \\ \text{Joint}: && p(\boldsymbol{u}, \boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}) &= \frac{p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \, p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) \, p(\boldsymbol{\theta} \mid \boldsymbol{x})}{p(\boldsymbol{y} \mid \boldsymbol{x})} \end{aligned}

The asymmetry is deliberate: the params-only posterior must marginalise out the still-unknown state u\boldsymbol{u} (hence the integral), whereas the state-only posterior conditions on θ\boldsymbol{\theta}, leaving u\boldsymbol{u} as the only unknown. It is therefore a plain Bayes posterior with no second latent to integrate — only the evidence p(yx,θ)=p(yu,θ,x)p(uθ,x)dup(\boldsymbol{y} \mid \boldsymbol{x}, \boldsymbol{\theta}) = \int p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \, p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) \, \mathrm{d}\boldsymbol{u} appears in its normaliser.

1B · Variational Inference

Introduce a variational distribution qq with parameters ψ\boldsymbol{\psi} optimised per observation. Here ψ\boldsymbol{\psi} does not generalise across different y\boldsymbol{y} or x\boldsymbol{x} — a fresh ψ\boldsymbol{\psi} is optimised for each observation.

Params onlyq(θψ)p(θy,x)q(\boldsymbol{\theta} \mid \boldsymbol{\psi}) \approx p(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x})

L(ψ)=Eq(θψ) ⁣[logp(yθ,x)]DKL ⁣[q(θψ)p(θx)]\mathcal{L}(\boldsymbol{\psi}) = \mathbb{E}_{q(\boldsymbol{\theta} \mid \boldsymbol{\psi})} \!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{\theta}, \boldsymbol{x}) \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{\theta} \mid \boldsymbol{\psi}) \,\|\, p(\boldsymbol{\theta} \mid \boldsymbol{x}) \,\right]

State onlyq(uψ)p(uy,x,θ)q(\boldsymbol{u} \mid \boldsymbol{\psi}) \approx p(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta})

L(ψ)=Eq(uψ) ⁣[logp(yu,θ,x)]DKL ⁣[q(uψ)p(uθ,x)]\mathcal{L}(\boldsymbol{\psi}) = \mathbb{E}_{q(\boldsymbol{u} \mid \boldsymbol{\psi})} \!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u} \mid \boldsymbol{\psi}) \,\|\, p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right]

Joint — two ways to factor the variational family:

Factored
Hierarchical
q(u,θψ)=q(uψ1)q(θψ2)p(u,θy,x)q(\boldsymbol{u}, \boldsymbol{\theta} \mid \boldsymbol{\psi}) = q(\boldsymbol{u} \mid \boldsymbol{\psi}_1) \, q(\boldsymbol{\theta} \mid \boldsymbol{\psi}_2) \approx p(\boldsymbol{u}, \boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x})
L(ψ)=Eq(uψ1)q(θψ2) ⁣[logp(yu,θ,x)]DKL ⁣[q(uψ1)p(uθ,x)]DKL ⁣[q(θψ2)p(θx)]\mathcal{L}(\boldsymbol{\psi}) = \mathbb{E}_{q(\boldsymbol{u} \mid \boldsymbol{\psi}_1)\, q(\boldsymbol{\theta} \mid \boldsymbol{\psi}_2)} \!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u} \mid \boldsymbol{\psi}_1) \,\|\, p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{\theta} \mid \boldsymbol{\psi}_2) \,\|\, p(\boldsymbol{\theta} \mid \boldsymbol{x}) \,\right]

1C · Amortized Inference

The variational family now conditions on both y\boldsymbol{y} and x\boldsymbol{x}. A single forward pass replaces per-observation optimisation — train once, generalise over y\boldsymbol{y} and x\boldsymbol{x}.

Params onlyq(θy,x,ψ)p(θy,x)q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) \approx p(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x})

L(ψ)=Eq(θy,x,ψ) ⁣[logp(yθ,x)]DKL ⁣[q(θy,x,ψ)p(θx)]\mathcal{L}(\boldsymbol{\psi}) = \mathbb{E}_{q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi})} \!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{\theta}, \boldsymbol{x}) \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) \,\|\, p(\boldsymbol{\theta} \mid \boldsymbol{x}) \,\right]

State onlyq(uy,x,ψ)p(uy,x,θ)q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) \approx p(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta})

L(ψ)=Eq(uy,x,ψ) ⁣[logp(yu,θ,x)]DKL ⁣[q(uy,x,ψ)p(uθ,x)]\mathcal{L}(\boldsymbol{\psi}) = \mathbb{E}_{q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi})} \!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) \,\|\, p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right]

Joint

Factored
Hierarchical
q(u,θy,x,ψ)=q(uy,x,ψ1)q(θy,x,ψ2)q(\boldsymbol{u}, \boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) = q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_1) \, q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_2)
L(ψ)=Eq(uy,x,ψ1)q(θy,x,ψ2) ⁣[logp(yu,θ,x)]DKL ⁣[q(uy,x,ψ1)p(uθ,x)]DKL ⁣[q(θy,x,ψ2)p(θx)]\mathcal{L}(\boldsymbol{\psi}) = \mathbb{E}_{q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_1)\, q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_2)} \!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_1) \,\|\, p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_2) \,\|\, p(\boldsymbol{\theta} \mid \boldsymbol{x}) \,\right]

Track 2 — Emulator

Generative Model

The emulator introduces an internal latent variable z\boldsymbol{z} that compresses the full state u\boldsymbol{u}. It is itself a generative model, trained on simulator outputs before inference is performed.

p(y,u,z,θx)=p(yu,θ,x)p(uz,θ,x)p(zθ,x)p(θx)p(\boldsymbol{y}, \boldsymbol{u}, \boldsymbol{z}, \boldsymbol{\theta} \mid \boldsymbol{x}) = p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \, p(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \, p(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{x}) \, p(\boldsymbol{\theta} \mid \boldsymbol{x})

The marginal likelihood now also integrates out z\boldsymbol{z},

p(yx)=p(yu,θ,x)p(uz,θ,x)p(zθ,x)p(θx)dudzdθ.p(\boldsymbol{y} \mid \boldsymbol{x}) = \iiint p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \, p(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \, p(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{x}) \, p(\boldsymbol{\theta} \mid \boldsymbol{x}) \, \mathrm{d}\boldsymbol{u}\, \mathrm{d}\boldsymbol{z}\, \mathrm{d}\boldsymbol{\theta}.

2.0 · Emulator Training

Before inference, train the emulator on simulator outputs {u,x,θ}\{\boldsymbol{u}, \boldsymbol{x}, \boldsymbol{\theta}\} using its own internal ELBO. This introduces emulator inference parameters ψem\boldsymbol{\psi}_{\mathrm{em}} and an encoder q(zu,x,θ,ψem)q(\boldsymbol{z} \mid \boldsymbol{u}, \boldsymbol{x}, \boldsymbol{\theta}, \boldsymbol{\psi}_{\mathrm{em}}).

Lem(θ,ψem)=Eq(zu,x,θ,ψem) ⁣[logp(uz,θ,x)]DKL ⁣[q(zu,x,θ,ψem)p(zθ,x)]\mathcal{L}_{\mathrm{em}}(\boldsymbol{\theta}, \boldsymbol{\psi}_{\mathrm{em}}) = \mathbb{E}_{q(\boldsymbol{z} \mid \boldsymbol{u}, \boldsymbol{x}, \boldsymbol{\theta}, \boldsymbol{\psi}_{\mathrm{em}})} \!\left[ \log p(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{z} \mid \boldsymbol{u}, \boldsymbol{x}, \boldsymbol{\theta}, \boldsymbol{\psi}_{\mathrm{em}}) \,\|\, p(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right]

2.1 · Emulator Uncertainty Characterization

Characterise the discrepancy between the emulator and the true simulator before using the emulator for inference,

p(utrueuem,x,θ).p(\boldsymbol{u}_{\mathrm{true}} \mid \boldsymbol{u}_{\mathrm{em}}, \boldsymbol{x}, \boldsymbol{\theta}).

2A · Exact Inference

The targets now involve three unknowns: u,z,θ\boldsymbol{u}, \boldsymbol{z}, \boldsymbol{\theta}.

Params only:p(θy,x)=p(θy,u,z,x)p(u,zy,x)dudzState only:p(uy,x,θ)=p(uy,z,x,θ)p(zy,x,θ)dzLatent only:p(zy,x,θ)=p(zy,u,x,θ)p(uy,x,θ)duJoint:p(u,z,θy,x)=p(yu,θ,x)p(uz,θ,x)p(zθ,x)p(θx)p(yx)\begin{aligned} \text{Params only}: && p(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}) &= \iint p(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{u}, \boldsymbol{z}, \boldsymbol{x}) \, p(\boldsymbol{u}, \boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{x}) \, \mathrm{d}\boldsymbol{u}\, \mathrm{d}\boldsymbol{z} \\ \text{State only}: && p(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta}) &= \int p(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{z}, \boldsymbol{x}, \boldsymbol{\theta}) \, p(\boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta}) \, \mathrm{d}\boldsymbol{z} \\ \text{Latent only}: && p(\boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta}) &= \int p(\boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{u}, \boldsymbol{x}, \boldsymbol{\theta}) \, p(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta}) \, \mathrm{d}\boldsymbol{u} \\ \text{Joint}: && p(\boldsymbol{u}, \boldsymbol{z}, \boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}) &= \frac{p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \, p(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \, p(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{x}) \, p(\boldsymbol{\theta} \mid \boldsymbol{x})}{p(\boldsymbol{y} \mid \boldsymbol{x})} \end{aligned}

Here every conditional target sits in the chain zuy\boldsymbol{z} \to \boldsymbol{u} \to \boldsymbol{y} and marginalises out the other latent: conditioning on θ\boldsymbol{\theta}, the state-only posterior integrates out the deeper latent z\boldsymbol{z}; the latent-only posterior — the emulator’s compressed state given observations — integrates out the intervening state u\boldsymbol{u}; and the params-only posterior integrates out both. Only the joint, which targets all unknowns at once, is a single Bayes ratio with no marginalisation.

2B · Variational Inference

Per-observation qq with parameters ψ\boldsymbol{\psi}. The emulator’s internal latent z\boldsymbol{z} is now an additional object the variational family must handle.

Params onlyq(θψ)p(θy,x)q(\boldsymbol{\theta} \mid \boldsymbol{\psi}) \approx p(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x})

L(ψ)=Eq(θψ) ⁣[logp(yθ,x)]DKL ⁣[q(θψ)p(θx)]\mathcal{L}(\boldsymbol{\psi}) = \mathbb{E}_{q(\boldsymbol{\theta} \mid \boldsymbol{\psi})} \!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{\theta}, \boldsymbol{x}) \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{\theta} \mid \boldsymbol{\psi}) \,\|\, p(\boldsymbol{\theta} \mid \boldsymbol{x}) \,\right]

with p(yθ,x)=p(yu,θ,x)p(uz,θ,x)p(zθ,x)dudzp(\boldsymbol{y} \mid \boldsymbol{\theta}, \boldsymbol{x}) = \iint p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \, p(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \, p(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{x}) \, \mathrm{d}\boldsymbol{u}\, \mathrm{d}\boldsymbol{z} requiring a further inner approximation over both u\boldsymbol{u} and z\boldsymbol{z}.

State onlyq(uψ)p(uy,x,θ)q(\boldsymbol{u} \mid \boldsymbol{\psi}) \approx p(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta})

L(ψ)=Eq(uψ) ⁣[logp(yu,θ,x)]DKL ⁣[q(uψ)p(uθ,x)]\mathcal{L}(\boldsymbol{\psi}) = \mathbb{E}_{q(\boldsymbol{u} \mid \boldsymbol{\psi})} \!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u} \mid \boldsymbol{\psi}) \,\|\, p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right]

where p(uθ,x)=p(uz,θ,x)p(zθ,x)dzp(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) = \int p(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \, p(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{x}) \, \mathrm{d}\boldsymbol{z} is itself intractable and approximated via the emulator ELBO (19).

Latent onlyq(zψ)p(zy,x,θ)q(\boldsymbol{z} \mid \boldsymbol{\psi}) \approx p(\boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta})

L(ψ)=Eq(zψ) ⁣[logp(yz,θ,x)]DKL ⁣[q(zψ)p(zθ,x)]\mathcal{L}(\boldsymbol{\psi}) = \mathbb{E}_{q(\boldsymbol{z} \mid \boldsymbol{\psi})} \!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{z} \mid \boldsymbol{\psi}) \,\|\, p(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right]

with p(yz,θ,x)=p(yu,θ,x)p(uz,θ,x)dup(\boldsymbol{y} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) = \int p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \, p(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \, \mathrm{d}\boldsymbol{u}.

Joint

Factored
Hierarchical
q(u,z,θψ)=q(uψ1)q(zψ2)q(θψ3)q(\boldsymbol{u}, \boldsymbol{z}, \boldsymbol{\theta} \mid \boldsymbol{\psi}) = q(\boldsymbol{u} \mid \boldsymbol{\psi}_1) \, q(\boldsymbol{z} \mid \boldsymbol{\psi}_2) \, q(\boldsymbol{\theta} \mid \boldsymbol{\psi}_3)
L(ψ)=Eq(u,z,θψ) ⁣[logp(yu,θ,x)]DKL ⁣[q(uψ1)p(uz,θ,x)]DKL ⁣[q(zψ2)p(zθ,x)]DKL ⁣[q(θψ3)p(θx)]\mathcal{L}(\boldsymbol{\psi}) = \mathbb{E}_{q(\boldsymbol{u}, \boldsymbol{z}, \boldsymbol{\theta} \mid \boldsymbol{\psi})} \!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u} \mid \boldsymbol{\psi}_1) \,\|\, p(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \,\right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{z} \mid \boldsymbol{\psi}_2) \,\|\, p(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{\theta} \mid \boldsymbol{\psi}_3) \,\|\, p(\boldsymbol{\theta} \mid \boldsymbol{x}) \,\right]

2C · Amortized Inference

The variational family conditions on both y\boldsymbol{y} and x\boldsymbol{x}; a single forward pass replaces per-observation optimisation.

Params onlyq(θy,x,ψ)p(θy,x)q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) \approx p(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x})

L(ψ)=Eq(θy,x,ψ) ⁣[logp(yθ,x)]DKL ⁣[q(θy,x,ψ)p(θx)]\mathcal{L}(\boldsymbol{\psi}) = \mathbb{E}_{q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi})} \!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{\theta}, \boldsymbol{x}) \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) \,\|\, p(\boldsymbol{\theta} \mid \boldsymbol{x}) \,\right]

State onlyq(uy,x,ψ)p(uy,x,θ)q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) \approx p(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta})

L(ψ)=Eq(uy,x,ψ) ⁣[logp(yu,θ,x)]DKL ⁣[q(uy,x,ψ)p(uθ,x)]\mathcal{L}(\boldsymbol{\psi}) = \mathbb{E}_{q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi})} \!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) \,\|\, p(\boldsymbol{u} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right]

Latent onlyq(zy,x,ψ)p(zy,x,θ)q(\boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) \approx p(\boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta})

L(ψ)=Eq(zy,x,ψ) ⁣[logp(yz,θ,x)]DKL ⁣[q(zy,x,ψ)p(zθ,x)]\mathcal{L}(\boldsymbol{\psi}) = \mathbb{E}_{q(\boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi})} \!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) \,\|\, p(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right]

Joint

Factored
Hierarchical
q(u,z,θy,x,ψ)=q(uy,x,ψ1)q(zy,x,ψ2)q(θy,x,ψ3)q(\boldsymbol{u}, \boldsymbol{z}, \boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}) = q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_1) \, q(\boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_2) \, q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_3)
L(ψ)=Eq(u,z,θy,x,ψ) ⁣[logp(yu,θ,x)]DKL ⁣[q(uy,x,ψ1)p(uz,θ,x)]DKL ⁣[q(zy,x,ψ2)p(zθ,x)]DKL ⁣[q(θy,x,ψ3)p(θx)]\mathcal{L}(\boldsymbol{\psi}) = \mathbb{E}_{q(\boldsymbol{u}, \boldsymbol{z}, \boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi})} \!\left[ \log p(\boldsymbol{y} \mid \boldsymbol{u}, \boldsymbol{\theta}, \boldsymbol{x}) \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_1) \,\|\, p(\boldsymbol{u} \mid \boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{x}) \,\right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{z} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_2) \,\|\, p(\boldsymbol{z} \mid \boldsymbol{\theta}, \boldsymbol{x}) \,\right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_3) \,\|\, p(\boldsymbol{\theta} \mid \boldsymbol{x}) \,\right]

Summary Table

Table 2:Every (track, regime, target) combination with its variational family and ELBO terms.

TrackRegimeTargetVariational familyELBO terms
SimulatorExactθ\boldsymbol{\theta}p(θy,x)p(\boldsymbol{\theta}\mid\boldsymbol{y},\boldsymbol{x})intractable
SimulatorExactu\boldsymbol{u}p(uy,x,θ)p(\boldsymbol{u}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\theta})intractable
SimulatorExactu,θ\boldsymbol{u},\boldsymbol{\theta}p(u,θy,x)p(\boldsymbol{u},\boldsymbol{\theta}\mid\boldsymbol{y},\boldsymbol{x})intractable
SimulatorVIθ\boldsymbol{\theta}q(θψ)q(\boldsymbol{\theta}\mid\boldsymbol{\psi})recon +  KLθ+\;\mathrm{KL}_\theta
SimulatorVIu\boldsymbol{u}q(uψ)q(\boldsymbol{u}\mid\boldsymbol{\psi})recon +  KLu+\;\mathrm{KL}_u
SimulatorVIu,θ\boldsymbol{u},\boldsymbol{\theta} factoredq(uψ1)q(θψ2)q(\boldsymbol{u}\mid\boldsymbol{\psi}_1)\,q(\boldsymbol{\theta}\mid\boldsymbol{\psi}_2)recon +  KLu+KLθ+\;\mathrm{KL}_u+\mathrm{KL}_\theta
SimulatorVIu,θ\boldsymbol{u},\boldsymbol{\theta} hier.q(uθ,ψ1)q(θψ2)q(\boldsymbol{u}\mid\boldsymbol{\theta},\boldsymbol{\psi}_1)\,q(\boldsymbol{\theta}\mid\boldsymbol{\psi}_2)recon +  KLuθ+KLθ+\;\mathrm{KL}_{u\mid\theta}+\mathrm{KL}_\theta
SimulatorAmortizedθ\boldsymbol{\theta}q(θy,x,ψ)q(\boldsymbol{\theta}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\psi})recon +  KLθ+\;\mathrm{KL}_\theta
SimulatorAmortizedu\boldsymbol{u}q(uy,x,ψ)q(\boldsymbol{u}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\psi})recon +  KLu+\;\mathrm{KL}_u
SimulatorAmortizedu,θ\boldsymbol{u},\boldsymbol{\theta} factoredq(uy,x,ψ1)q(θy,x,ψ2)q(\boldsymbol{u}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\psi}_1)\,q(\boldsymbol{\theta}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\psi}_2)recon +  KLu+KLθ+\;\mathrm{KL}_u+\mathrm{KL}_\theta
SimulatorAmortizedu,θ\boldsymbol{u},\boldsymbol{\theta} hier.q(uy,x,θ,ψ1)q(θy,x,ψ2)q(\boldsymbol{u}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\theta},\boldsymbol{\psi}_1)\,q(\boldsymbol{\theta}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\psi}_2)recon +  KLuθ+KLθ+\;\mathrm{KL}_{u\mid\theta}+\mathrm{KL}_\theta
EmulatorTrainingz\boldsymbol{z}q(zu,x,θ,ψem)q(\boldsymbol{z}\mid\boldsymbol{u},\boldsymbol{x},\boldsymbol{\theta},\boldsymbol{\psi}_{\mathrm{em}})recon +  KLz+\;\mathrm{KL}_z
EmulatorExactθ\boldsymbol{\theta}p(θy,x)p(\boldsymbol{\theta}\mid\boldsymbol{y},\boldsymbol{x})intractable
EmulatorExactu\boldsymbol{u}p(uy,x,θ)p(\boldsymbol{u}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\theta})intractable
EmulatorExactz\boldsymbol{z}p(zy,x,θ)p(\boldsymbol{z}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\theta})intractable
EmulatorExactu,z,θ\boldsymbol{u},\boldsymbol{z},\boldsymbol{\theta}p(u,z,θy,x)p(\boldsymbol{u},\boldsymbol{z},\boldsymbol{\theta}\mid\boldsymbol{y},\boldsymbol{x})intractable
EmulatorVIθ\boldsymbol{\theta}q(θψ)q(\boldsymbol{\theta}\mid\boldsymbol{\psi})recon +  KLθ+\;\mathrm{KL}_\theta
EmulatorVIu\boldsymbol{u}q(uψ)q(\boldsymbol{u}\mid\boldsymbol{\psi})recon +  KLu+\;\mathrm{KL}_u
EmulatorVIz\boldsymbol{z}q(zψ)q(\boldsymbol{z}\mid\boldsymbol{\psi})recon +  KLz+\;\mathrm{KL}_z
EmulatorVIu,z,θ\boldsymbol{u},\boldsymbol{z},\boldsymbol{\theta} factoredq(uψ1)q(zψ2)q(θψ3)q(\boldsymbol{u}\mid\boldsymbol{\psi}_1)\,q(\boldsymbol{z}\mid\boldsymbol{\psi}_2)\,q(\boldsymbol{\theta}\mid\boldsymbol{\psi}_3)recon +  KLu+KLz+KLθ+\;\mathrm{KL}_u+\mathrm{KL}_z+\mathrm{KL}_\theta
EmulatorVIu,z,θ\boldsymbol{u},\boldsymbol{z},\boldsymbol{\theta} hier.q(uz,θ,ψ1)q(zθ,ψ2)q(θψ3)q(\boldsymbol{u}\mid\boldsymbol{z},\boldsymbol{\theta},\boldsymbol{\psi}_1)\,q(\boldsymbol{z}\mid\boldsymbol{\theta},\boldsymbol{\psi}_2)\,q(\boldsymbol{\theta}\mid\boldsymbol{\psi}_3)recon +  KLuz,θ+KLzθ+KLθ+\;\mathrm{KL}_{u\mid z,\theta}+\mathrm{KL}_{z\mid\theta}+\mathrm{KL}_\theta
EmulatorAmortizedθ\boldsymbol{\theta}q(θy,x,ψ)q(\boldsymbol{\theta}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\psi})recon +  KLθ+\;\mathrm{KL}_\theta
EmulatorAmortizedu\boldsymbol{u}q(uy,x,ψ)q(\boldsymbol{u}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\psi})recon +  KLu+\;\mathrm{KL}_u
EmulatorAmortizedz\boldsymbol{z}q(zy,x,ψ)q(\boldsymbol{z}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\psi})recon +  KLz+\;\mathrm{KL}_z
EmulatorAmortizedu,z,θ\boldsymbol{u},\boldsymbol{z},\boldsymbol{\theta} factoredq(uy,x,ψ1)q(zy,x,ψ2)q(θy,x,ψ3)q(\boldsymbol{u}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\psi}_1)\,q(\boldsymbol{z}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\psi}_2)\,q(\boldsymbol{\theta}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\psi}_3)recon +  KLu+KLz+KLθ+\;\mathrm{KL}_u+\mathrm{KL}_z+\mathrm{KL}_\theta
EmulatorAmortizedu,z,θ\boldsymbol{u},\boldsymbol{z},\boldsymbol{\theta} hier.q(uy,z,x,θ,ψ1)q(zy,x,θ,ψ2)q(θy,x,ψ3)q(\boldsymbol{u}\mid\boldsymbol{y},\boldsymbol{z},\boldsymbol{x},\boldsymbol{\theta},\boldsymbol{\psi}_1)\,q(\boldsymbol{z}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\theta},\boldsymbol{\psi}_2)\,q(\boldsymbol{\theta}\mid\boldsymbol{y},\boldsymbol{x},\boldsymbol{\psi}_3)recon +  KLuz,θ+KLzθ+KLθ+\;\mathrm{KL}_{u\mid z,\theta}+\mathrm{KL}_{z\mid\theta}+\mathrm{KL}_\theta