Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Sequential Probabilistic Inference Schema

CSIC
UCM
IGEO

This is the sequential (state-space) companion to the Full Probabilistic Inference Schema. We now have a sequence of time steps t=1,,Tt = 1, \ldots, T. The state ut\boldsymbol{u}_t evolves according to a dynamical model (the transition distribution), and observations yt\boldsymbol{y}_t are generated from the current state ut\boldsymbol{u}_t via the observation operator H\mathbf{H}.

The same two model tracks carry over — a simulator (state evolves directly) and an emulator (a latent state zt\boldsymbol{z}_t evolves in a compressed space) — and the same three inference regimes (exact, variational, amortized), now joined by the classical recursive filtering and smoothing algorithms.

Notation

Table 1:Symbols used throughout this note.

SymbolSpaceMeaning
yt\boldsymbol{y}_tRDy\mathbb{R}^{D_y}observation at time tt (gappy, noisy)
ut\boldsymbol{u}_tRDu\mathbb{R}^{D_u}full state at time tt
zt\boldsymbol{z}_tRDz\mathbb{R}^{D_z}emulator latent state at time tt (emulator track)
θ\boldsymbol{\theta}RDθ\mathbb{R}^{D_\theta}parameters — static, do not evolve in time
xt\boldsymbol{x}_tRDx\mathbb{R}^{D_x}covariates / controls at time tt
ψ\boldsymbol{\psi}RDψ\mathbb{R}^{D_\psi}inference (variational) parameters
u1:T\boldsymbol{u}_{1:T}the full trajectory (u1,,uT)(\boldsymbol{u}_1, \ldots, \boldsymbol{u}_T)

Sequential Generative Model

The joint distribution over the full sequence factorises as

p(y1:T,u1:T,θx1:T)=p(θx1:T)param priorp(u0θ,x0)initial priort=1Tp(ytut,θ,xt)observationp(utut1,θ,xt)transitionp(\boldsymbol{y}_{1:T}, \boldsymbol{u}_{1:T}, \boldsymbol{\theta} \mid \boldsymbol{x}_{1:T}) = \underbrace{p(\boldsymbol{\theta} \mid \boldsymbol{x}_{1:T})}_{\text{param prior}} \, \underbrace{p(\boldsymbol{u}_0 \mid \boldsymbol{\theta}, \boldsymbol{x}_0)}_{\text{initial prior}} \prod_{t=1}^{T} \underbrace{p(\boldsymbol{y}_t \mid \boldsymbol{u}_t, \boldsymbol{\theta}, \boldsymbol{x}_t)}_{\text{observation}} \, \underbrace{p(\boldsymbol{u}_t \mid \boldsymbol{u}_{t-1}, \boldsymbol{\theta}, \boldsymbol{x}_t)}_{\text{transition}}

Components

Graphical Structure


Target Posteriors

Filtering posterior (online, causal) — state at tt given observations up to and including tt; uses no future observations and is updated recursively as each yt\boldsymbol{y}_t arrives:

p(uty1:t,x1:t,θ)p(\boldsymbol{u}_t \mid \boldsymbol{y}_{1:t}, \boldsymbol{x}_{1:t}, \boldsymbol{\theta})

Smoothing posterior (offline, non-causal) — state at tt given all observations, including future ones; requires the full sequence y1:T\boldsymbol{y}_{1:T}:

p(uty1:T,x1:T,θ)p(\boldsymbol{u}_t \mid \boldsymbol{y}_{1:T}, \boldsymbol{x}_{1:T}, \boldsymbol{\theta})

Prediction posterior — state kk steps ahead given observations up to tt (none from t+1t+1 onward):

p(ut+ky1:t,x1:t+k,θ)p(\boldsymbol{u}_{t+k} \mid \boldsymbol{y}_{1:t}, \boldsymbol{x}_{1:t+k}, \boldsymbol{\theta})

Parameter posterior — static parameters inferred from the full sequence, marginalising the state trajectory:

p(θy1:T,x1:T)=p(θy1:T,u1:T,x1:T)p(u1:Ty1:T,x1:T)du1:Tp(\boldsymbol{\theta} \mid \boldsymbol{y}_{1:T}, \boldsymbol{x}_{1:T}) = \int p(\boldsymbol{\theta} \mid \boldsymbol{y}_{1:T}, \boldsymbol{u}_{1:T}, \boldsymbol{x}_{1:T}) \, p(\boldsymbol{u}_{1:T} \mid \boldsymbol{y}_{1:T}, \boldsymbol{x}_{1:T}) \, \mathrm{d}\boldsymbol{u}_{1:T}

Joint smoothing posterior — the full joint over all states and parameters:

p(u1:T,θy1:T,x1:T)p(\boldsymbol{u}_{1:T}, \boldsymbol{\theta} \mid \boldsymbol{y}_{1:T}, \boldsymbol{x}_{1:T})

Track 1 — Simulator

The simulator generative model is (1) above: the state ut\boldsymbol{u}_t evolves directly, with no internal latent variable.

1A · Exact Posteriors

1B · Filtering Algorithms

Recursive algorithms that process observations one at a time and maintain a running approximation to the filtering posterior.

Kalman (exact)
Extended KF
Ensemble KF
Particle

Linear-Gaussian — exact. Requires

p(utut1,θ,xt)=N ⁣(utAθut1+Bθxt,  Qθ)p(ytut,θ,xt)=N ⁣(ytHut,  Rθ)\begin{aligned} p(\boldsymbol{u}_t \mid \boldsymbol{u}_{t-1}, \boldsymbol{\theta}, \boldsymbol{x}_t) &= \mathcal{N}\!\left(\boldsymbol{u}_t \mid \mathbf{A}_{\boldsymbol{\theta}} \boldsymbol{u}_{t-1} + \mathbf{B}_{\boldsymbol{\theta}} \boldsymbol{x}_t, \; \mathbf{Q}_{\boldsymbol{\theta}}\right) \\ p(\boldsymbol{y}_t \mid \boldsymbol{u}_t, \boldsymbol{\theta}, \boldsymbol{x}_t) &= \mathcal{N}\!\left(\boldsymbol{y}_t \mid \mathbf{H} \boldsymbol{u}_t, \; \mathbf{R}_{\boldsymbol{\theta}}\right) \end{aligned}

Predict — push the previous filter through the dynamics:

p(uty1:t1,x1:t,θ)=p(utut1,θ,xt)p(ut1y1:t1,x1:t1,θ)dut1p(\boldsymbol{u}_t \mid \boldsymbol{y}_{1:t-1}, \boldsymbol{x}_{1:t}, \boldsymbol{\theta}) = \int p(\boldsymbol{u}_t \mid \boldsymbol{u}_{t-1}, \boldsymbol{\theta}, \boldsymbol{x}_t) \, p(\boldsymbol{u}_{t-1} \mid \boldsymbol{y}_{1:t-1}, \boldsymbol{x}_{1:t-1}, \boldsymbol{\theta}) \, \mathrm{d}\boldsymbol{u}_{t-1}

Update — correct the prediction with the new observation:

p(uty1:t,x1:t,θ)p(ytut,θ,xt)p(uty1:t1,x1:t,θ)p(\boldsymbol{u}_t \mid \boldsymbol{y}_{1:t}, \boldsymbol{x}_{1:t}, \boldsymbol{\theta}) \propto p(\boldsymbol{y}_t \mid \boldsymbol{u}_t, \boldsymbol{\theta}, \boldsymbol{x}_t) \, p(\boldsymbol{u}_t \mid \boldsymbol{y}_{1:t-1}, \boldsymbol{x}_{1:t}, \boldsymbol{\theta})

Both steps are exact and closed-form for linear-Gaussian models.

1C · Smoothing Algorithms

Offline algorithms that use the full sequence.

1D · Variational Inference

For nonlinear / non-Gaussian models where filtering and smoothing are too expensive or unavailable, introduce a variational distribution over the full state sequence (and parameters). Here ψ\boldsymbol{\psi} is optimised once per observed sequence — no generalisation.

Filtering variational posterior — maintained recursively, with ψt\boldsymbol{\psi}_t updated at each step as new yt\boldsymbol{y}_t arrives:

q(uty1:t,x1:t,ψt)p(uty1:t,x1:t,θ)q(\boldsymbol{u}_t \mid \boldsymbol{y}_{1:t}, \boldsymbol{x}_{1:t}, \boldsymbol{\psi}_t) \approx p(\boldsymbol{u}_t \mid \boldsymbol{y}_{1:t}, \boldsymbol{x}_{1:t}, \boldsymbol{\theta})

Smoothing variational posterior — two ways to structure the family:

Factored
Structured (Markov)
q(u1:Tψ)=t=1Tq(utψt)p(u1:Ty1:T,x1:T,θ)q(\boldsymbol{u}_{1:T} \mid \boldsymbol{\psi}) = \prod_{t=1}^{T} q(\boldsymbol{u}_t \mid \boldsymbol{\psi}_t) \approx p(\boldsymbol{u}_{1:T} \mid \boldsymbol{y}_{1:T}, \boldsymbol{x}_{1:T}, \boldsymbol{\theta})
L(ψ)=t=1TEq(utψt) ⁣[logp(ytut,θ,xt)]t=1TDKL ⁣[q(utψt)p(utut1,θ,xt)]\mathcal{L}(\boldsymbol{\psi}) = \sum_{t=1}^{T} \mathbb{E}_{q(\boldsymbol{u}_t \mid \boldsymbol{\psi}_t)}\!\left[ \log p(\boldsymbol{y}_t \mid \boldsymbol{u}_t, \boldsymbol{\theta}, \boldsymbol{x}_t) \right] - \sum_{t=1}^{T} D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u}_t \mid \boldsymbol{\psi}_t) \,\|\, p(\boldsymbol{u}_t \mid \boldsymbol{u}_{t-1}, \boldsymbol{\theta}, \boldsymbol{x}_t) \,\right]

The factored family breaks the temporal dependencies of the true smoothing posterior — a strong approximation.

Joint smoothing + parameter inference — Hierarchical:

q(u1:T,θψ)=q(u1:Tθ,ψu)q(θψθ)q(\boldsymbol{u}_{1:T}, \boldsymbol{\theta} \mid \boldsymbol{\psi}) = q(\boldsymbol{u}_{1:T} \mid \boldsymbol{\theta}, \boldsymbol{\psi}_u) \, q(\boldsymbol{\theta} \mid \boldsymbol{\psi}_\theta)
L(ψ)=Eq(θψθ) ⁣[t=1TEq(utut1,θ,ψu) ⁣[logp(ytut,θ,xt)]DKL ⁣[q(u1:Tθ,ψu)p(u1:Tθ,x1:T)]]DKL ⁣[q(θψθ)p(θx1:T)]\mathcal{L}(\boldsymbol{\psi}) = \mathbb{E}_{q(\boldsymbol{\theta} \mid \boldsymbol{\psi}_\theta)}\!\left[ \sum_{t=1}^{T} \mathbb{E}_{q(\boldsymbol{u}_t \mid \boldsymbol{u}_{t-1}, \boldsymbol{\theta}, \boldsymbol{\psi}_u)}\!\left[ \log p(\boldsymbol{y}_t \mid \boldsymbol{u}_t, \boldsymbol{\theta}, \boldsymbol{x}_t) \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u}_{1:T} \mid \boldsymbol{\theta}, \boldsymbol{\psi}_u) \,\|\, p(\boldsymbol{u}_{1:T} \mid \boldsymbol{\theta}, \boldsymbol{x}_{1:T}) \,\right] \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{\theta} \mid \boldsymbol{\psi}_\theta) \,\|\, p(\boldsymbol{\theta} \mid \boldsymbol{x}_{1:T}) \,\right]

1E · Amortized Inference

Train a network once on many sequences; at test time a single forward pass over a new sequence gives the posterior — no per-sequence optimisation.

Amortized filtering — a recurrent network (RNN, LSTM, S4, Mamba) processes y1:t\boldsymbol{y}_{1:t} sequentially and emits a distribution over ut\boldsymbol{u}_t; causal, generalises across sequences:

q(uty1:t,x1:t,ψ)q(\boldsymbol{u}_t \mid \boldsymbol{y}_{1:t}, \boldsymbol{x}_{1:t}, \boldsymbol{\psi})

Amortized smoothing — an encoder that reads the full sequence (transformer, bidirectional RNN) and emits a distribution over ut\boldsymbol{u}_t at each step; non-causal, uses past and future observations:

q(uty1:T,x1:T,ψ)q(\boldsymbol{u}_t \mid \boldsymbol{y}_{1:T}, \boldsymbol{x}_{1:T}, \boldsymbol{\psi})

Amortized joint — Hierarchical:

q(u1:T,θy1:T,x1:T,ψ)=q(u1:Ty1:T,x1:T,θ,ψu)q(θy1:T,x1:T,ψθ)q(\boldsymbol{u}_{1:T}, \boldsymbol{\theta} \mid \boldsymbol{y}_{1:T}, \boldsymbol{x}_{1:T}, \boldsymbol{\psi}) = q(\boldsymbol{u}_{1:T} \mid \boldsymbol{y}_{1:T}, \boldsymbol{x}_{1:T}, \boldsymbol{\theta}, \boldsymbol{\psi}_u) \, q(\boldsymbol{\theta} \mid \boldsymbol{y}_{1:T}, \boldsymbol{x}_{1:T}, \boldsymbol{\psi}_\theta)
L(ψ)=Eq(θy,x,ψθ) ⁣[t=1TEq(uty,x,θ,ψu) ⁣[logp(ytut,θ,xt)]DKL ⁣[q(u1:Ty,x,θ,ψu)p(u1:Tθ,x1:T)]]DKL ⁣[q(θy,x,ψθ)p(θx1:T)]\mathcal{L}(\boldsymbol{\psi}) = \mathbb{E}_{q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_\theta)}\!\left[ \sum_{t=1}^{T} \mathbb{E}_{q(\boldsymbol{u}_t \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta}, \boldsymbol{\psi}_u)}\!\left[ \log p(\boldsymbol{y}_t \mid \boldsymbol{u}_t, \boldsymbol{\theta}, \boldsymbol{x}_t) \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u}_{1:T} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta}, \boldsymbol{\psi}_u) \,\|\, p(\boldsymbol{u}_{1:T} \mid \boldsymbol{\theta}, \boldsymbol{x}_{1:T}) \,\right] \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_\theta) \,\|\, p(\boldsymbol{\theta} \mid \boldsymbol{x}_{1:T}) \,\right]

Track 2 — Emulator

Generative Model

The emulator introduces an internal latent zt\boldsymbol{z}_t at each time step. The transition now operates in latent space and decodes to ut\boldsymbol{u}_t.

p(y1:T,u1:T,z1:T,θx1:T)=p(θx1:T)p(z0θ,x0)p(u0z0,θ,x0)t=1Tp(ytut,θ,xt)p(utzt,θ,xt)p(ztzt1,θ,xt)p(\boldsymbol{y}_{1:T}, \boldsymbol{u}_{1:T}, \boldsymbol{z}_{1:T}, \boldsymbol{\theta} \mid \boldsymbol{x}_{1:T}) = p(\boldsymbol{\theta} \mid \boldsymbol{x}_{1:T}) \, p(\boldsymbol{z}_0 \mid \boldsymbol{\theta}, \boldsymbol{x}_0) \, p(\boldsymbol{u}_0 \mid \boldsymbol{z}_0, \boldsymbol{\theta}, \boldsymbol{x}_0) \prod_{t=1}^{T} p(\boldsymbol{y}_t \mid \boldsymbol{u}_t, \boldsymbol{\theta}, \boldsymbol{x}_t) \, p(\boldsymbol{u}_t \mid \boldsymbol{z}_t, \boldsymbol{\theta}, \boldsymbol{x}_t) \, p(\boldsymbol{z}_t \mid \boldsymbol{z}_{t-1}, \boldsymbol{\theta}, \boldsymbol{x}_t)

The latent dynamics p(ztzt1,θ,xt)p(\boldsymbol{z}_t \mid \boldsymbol{z}_{t-1}, \boldsymbol{\theta}, \boldsymbol{x}_t) operate in the compressed space RDz\mathbb{R}^{D_z} (DzDuD_z \ll D_u); the decoder p(utzt,θ,xt)p(\boldsymbol{u}_t \mid \boldsymbol{z}_t, \boldsymbol{\theta}, \boldsymbol{x}_t) maps back to the full field at each step.

2.0 · Emulator Training

Train the emulator on simulator output sequences {u1:T,x1:T,θ}\{\boldsymbol{u}_{1:T}, \boldsymbol{x}_{1:T}, \boldsymbol{\theta}\}; it learns latent dynamics in z\boldsymbol{z}-space. This introduces an encoder q(ztut,xt,θ,ψem)q(\boldsymbol{z}_t \mid \boldsymbol{u}_t, \boldsymbol{x}_t, \boldsymbol{\theta}, \boldsymbol{\psi}_{\mathrm{em}}), decoder p(utzt,θ,xt)p(\boldsymbol{u}_t \mid \boldsymbol{z}_t, \boldsymbol{\theta}, \boldsymbol{x}_t), and transition p(ztzt1,θ,xt)p(\boldsymbol{z}_t \mid \boldsymbol{z}_{t-1}, \boldsymbol{\theta}, \boldsymbol{x}_t).

Lem(θ,ψem)=t=1TEq(ztut,xt,θ,ψem) ⁣[logp(utzt,θ,xt)]t=1TDKL ⁣[q(ztut,xt,θ,ψem)p(ztzt1,θ,xt)]\mathcal{L}_{\mathrm{em}}(\boldsymbol{\theta}, \boldsymbol{\psi}_{\mathrm{em}}) = \sum_{t=1}^{T} \mathbb{E}_{q(\boldsymbol{z}_t \mid \boldsymbol{u}_t, \boldsymbol{x}_t, \boldsymbol{\theta}, \boldsymbol{\psi}_{\mathrm{em}})}\!\left[ \log p(\boldsymbol{u}_t \mid \boldsymbol{z}_t, \boldsymbol{\theta}, \boldsymbol{x}_t) \right] - \sum_{t=1}^{T} D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{z}_t \mid \boldsymbol{u}_t, \boldsymbol{x}_t, \boldsymbol{\theta}, \boldsymbol{\psi}_{\mathrm{em}}) \,\|\, p(\boldsymbol{z}_t \mid \boldsymbol{z}_{t-1}, \boldsymbol{\theta}, \boldsymbol{x}_t) \,\right]

2.1 · Emulator Uncertainty Characterization

Characterise the per-step simulator–emulator discrepancy,

p(utrue,tuem,t,xt,θ).p(\boldsymbol{u}_{\mathrm{true},t} \mid \boldsymbol{u}_{\mathrm{em},t}, \boldsymbol{x}_t, \boldsymbol{\theta}).

2A · Exact Posteriors

2B · Filtering Algorithms (latent space)

Run the recursion in z\boldsymbol{z}-space, then decode to u\boldsymbol{u}-space.

Kalman in z
Ensemble KF in z
Particle in z

If the latent transition is linear-Gaussian, run the Kalman filter in z\boldsymbol{z}-space and decode via p(utzt,θ,xt)p(\boldsymbol{u}_t \mid \boldsymbol{z}_t, \boldsymbol{\theta}, \boldsymbol{x}_t).

Predict:p(zty1:t1,x1:t,θ)=p(ztzt1,θ,xt)p(zt1y1:t1,x1:t1,θ)dzt1Update:p(zty1:t,x1:t,θ)p(ytzt,θ,xt)p(zty1:t1,x1:t,θ)Decode:p(uty1:t,x1:t,θ)=p(utzt,θ,xt)p(zty1:t,x1:t,θ)dzt\begin{aligned} \text{Predict:} &\quad p(\boldsymbol{z}_t \mid \boldsymbol{y}_{1:t-1}, \boldsymbol{x}_{1:t}, \boldsymbol{\theta}) = \int p(\boldsymbol{z}_t \mid \boldsymbol{z}_{t-1}, \boldsymbol{\theta}, \boldsymbol{x}_t) \, p(\boldsymbol{z}_{t-1} \mid \boldsymbol{y}_{1:t-1}, \boldsymbol{x}_{1:t-1}, \boldsymbol{\theta}) \, \mathrm{d}\boldsymbol{z}_{t-1} \\ \text{Update:} &\quad p(\boldsymbol{z}_t \mid \boldsymbol{y}_{1:t}, \boldsymbol{x}_{1:t}, \boldsymbol{\theta}) \propto p(\boldsymbol{y}_t \mid \boldsymbol{z}_t, \boldsymbol{\theta}, \boldsymbol{x}_t) \, p(\boldsymbol{z}_t \mid \boldsymbol{y}_{1:t-1}, \boldsymbol{x}_{1:t}, \boldsymbol{\theta}) \\ \text{Decode:} &\quad p(\boldsymbol{u}_t \mid \boldsymbol{y}_{1:t}, \boldsymbol{x}_{1:t}, \boldsymbol{\theta}) = \int p(\boldsymbol{u}_t \mid \boldsymbol{z}_t, \boldsymbol{\theta}, \boldsymbol{x}_t) \, p(\boldsymbol{z}_t \mid \boldsymbol{y}_{1:t}, \boldsymbol{x}_{1:t}, \boldsymbol{\theta}) \, \mathrm{d}\boldsymbol{z}_t \end{aligned}

2C · Smoothing Algorithms (latent space)

2D · Variational Inference

Filtering variational posterior — updated recursively as new yt\boldsymbol{y}_t arrives:

q(zty1:t,x1:t,ψt)p(zty1:t,x1:t,θ)q(\boldsymbol{z}_t \mid \boldsymbol{y}_{1:t}, \boldsymbol{x}_{1:t}, \boldsymbol{\psi}_t) \approx p(\boldsymbol{z}_t \mid \boldsymbol{y}_{1:t}, \boldsymbol{x}_{1:t}, \boldsymbol{\theta})

Smoothing variational posterior — Structured (Markov):

q(z1:Tψ)=q(z0ψ0)t=1Tq(ztzt1,ψt)q(\boldsymbol{z}_{1:T} \mid \boldsymbol{\psi}) = q(\boldsymbol{z}_0 \mid \boldsymbol{\psi}_0) \prod_{t=1}^{T} q(\boldsymbol{z}_t \mid \boldsymbol{z}_{t-1}, \boldsymbol{\psi}_t)
L(ψ)=t=1TEq(ztψ) ⁣[logp(ytzt,θ,xt)]DKL ⁣[q(z1:Tψ)p(z1:Tθ,x1:T)]\mathcal{L}(\boldsymbol{\psi}) = \sum_{t=1}^{T} \mathbb{E}_{q(\boldsymbol{z}_t \mid \boldsymbol{\psi})}\!\left[ \log p(\boldsymbol{y}_t \mid \boldsymbol{z}_t, \boldsymbol{\theta}, \boldsymbol{x}_t) \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{z}_{1:T} \mid \boldsymbol{\psi}) \,\|\, p(\boldsymbol{z}_{1:T} \mid \boldsymbol{\theta}, \boldsymbol{x}_{1:T}) \,\right]

Joint smoothing over u,z,θ\boldsymbol{u}, \boldsymbol{z}, \boldsymbol{\theta} — Hierarchical:

q(u1:T,z1:T,θψ)=q(utzt,θ,ψu)per-step stateq(z1:Tθ,ψz)latent trajectoryq(θψθ)parametersq(\boldsymbol{u}_{1:T}, \boldsymbol{z}_{1:T}, \boldsymbol{\theta} \mid \boldsymbol{\psi}) = \underbrace{q(\boldsymbol{u}_t \mid \boldsymbol{z}_t, \boldsymbol{\theta}, \boldsymbol{\psi}_u)}_{\text{per-step state}} \, \underbrace{q(\boldsymbol{z}_{1:T} \mid \boldsymbol{\theta}, \boldsymbol{\psi}_z)}_{\text{latent trajectory}} \, \underbrace{q(\boldsymbol{\theta} \mid \boldsymbol{\psi}_\theta)}_{\text{parameters}}
L(ψ)=Eq(θψθ) ⁣[t=1TEq(ztθ,ψz) ⁣[Eq(utzt,θ,ψu) ⁣[logp(ytut,θ,xt)]DKL ⁣[q(utzt,θ,ψu)p(utzt,θ,xt)]]DKL ⁣[q(z1:Tθ,ψz)p(z1:Tθ,x1:T)]]DKL ⁣[q(θψθ)p(θx1:T)]\mathcal{L}(\boldsymbol{\psi}) = \mathbb{E}_{q(\boldsymbol{\theta} \mid \boldsymbol{\psi}_\theta)}\!\left[ \sum_{t=1}^{T} \mathbb{E}_{q(\boldsymbol{z}_t \mid \boldsymbol{\theta}, \boldsymbol{\psi}_z)}\!\left[ \mathbb{E}_{q(\boldsymbol{u}_t \mid \boldsymbol{z}_t, \boldsymbol{\theta}, \boldsymbol{\psi}_u)}\!\left[ \log p(\boldsymbol{y}_t \mid \boldsymbol{u}_t, \boldsymbol{\theta}, \boldsymbol{x}_t) \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u}_t \mid \boldsymbol{z}_t, \boldsymbol{\theta}, \boldsymbol{\psi}_u) \,\|\, p(\boldsymbol{u}_t \mid \boldsymbol{z}_t, \boldsymbol{\theta}, \boldsymbol{x}_t) \,\right] \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{z}_{1:T} \mid \boldsymbol{\theta}, \boldsymbol{\psi}_z) \,\|\, p(\boldsymbol{z}_{1:T} \mid \boldsymbol{\theta}, \boldsymbol{x}_{1:T}) \,\right] \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{\theta} \mid \boldsymbol{\psi}_\theta) \,\|\, p(\boldsymbol{\theta} \mid \boldsymbol{x}_{1:T}) \,\right]

2E · Amortized Inference

Amortized filtering in latent space — a recurrent network processes y1:t\boldsymbol{y}_{1:t} and emits a distribution over zt\boldsymbol{z}_t; causal:

q(zty1:t,x1:t,ψ)q(\boldsymbol{z}_t \mid \boldsymbol{y}_{1:t}, \boldsymbol{x}_{1:t}, \boldsymbol{\psi})

Amortized smoothing in latent space — a bidirectional encoder reads the full sequence; non-causal:

q(zty1:T,x1:T,ψ)q(\boldsymbol{z}_t \mid \boldsymbol{y}_{1:T}, \boldsymbol{x}_{1:T}, \boldsymbol{\psi})

Amortized joint — Hierarchical:

q(u1:T,z1:T,θy1:T,x1:T,ψ)=q(uty1:T,zt,xt,θ,ψu)q(z1:Ty1:T,x1:T,θ,ψz)q(θy1:T,x1:T,ψθ)q(\boldsymbol{u}_{1:T}, \boldsymbol{z}_{1:T}, \boldsymbol{\theta} \mid \boldsymbol{y}_{1:T}, \boldsymbol{x}_{1:T}, \boldsymbol{\psi}) = q(\boldsymbol{u}_t \mid \boldsymbol{y}_{1:T}, \boldsymbol{z}_t, \boldsymbol{x}_t, \boldsymbol{\theta}, \boldsymbol{\psi}_u) \, q(\boldsymbol{z}_{1:T} \mid \boldsymbol{y}_{1:T}, \boldsymbol{x}_{1:T}, \boldsymbol{\theta}, \boldsymbol{\psi}_z) \, q(\boldsymbol{\theta} \mid \boldsymbol{y}_{1:T}, \boldsymbol{x}_{1:T}, \boldsymbol{\psi}_\theta)
L(ψ)=Eq(θy,x,ψθ) ⁣[t=1TEq(zty,x,θ,ψz) ⁣[Eq(uty,zt,xt,θ,ψu) ⁣[logp(ytut,θ,xt)]DKL ⁣[q(uty,zt,xt,θ,ψu)p(utzt,θ,xt)]]DKL ⁣[q(z1:Ty,x,θ,ψz)p(z1:Tθ,x1:T)]]DKL ⁣[q(θy,x,ψθ)p(θx1:T)]\mathcal{L}(\boldsymbol{\psi}) = \mathbb{E}_{q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_\theta)}\!\left[ \sum_{t=1}^{T} \mathbb{E}_{q(\boldsymbol{z}_t \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta}, \boldsymbol{\psi}_z)}\!\left[ \mathbb{E}_{q(\boldsymbol{u}_t \mid \boldsymbol{y}, \boldsymbol{z}_t, \boldsymbol{x}_t, \boldsymbol{\theta}, \boldsymbol{\psi}_u)}\!\left[ \log p(\boldsymbol{y}_t \mid \boldsymbol{u}_t, \boldsymbol{\theta}, \boldsymbol{x}_t) \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{u}_t \mid \boldsymbol{y}, \boldsymbol{z}_t, \boldsymbol{x}_t, \boldsymbol{\theta}, \boldsymbol{\psi}_u) \,\|\, p(\boldsymbol{u}_t \mid \boldsymbol{z}_t, \boldsymbol{\theta}, \boldsymbol{x}_t) \,\right] \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{z}_{1:T} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\theta}, \boldsymbol{\psi}_z) \,\|\, p(\boldsymbol{z}_{1:T} \mid \boldsymbol{\theta}, \boldsymbol{x}_{1:T}) \,\right] \right] - D_{\mathrm{KL}}\!\left[\, q(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi}_\theta) \,\|\, p(\boldsymbol{\theta} \mid \boldsymbol{x}_{1:T}) \,\right]

Summary Table

Table 2:Targets and methods across both tracks. y,x\boldsymbol{y}, \boldsymbol{x} conditioning is abbreviated.

TrackStepTargetMethod
SimulatorExactp(uty1:t)p(\boldsymbol{u}_t \mid \boldsymbol{y}_{1:t})intractable (nonlinear)
SimulatorExactp(uty1:T)p(\boldsymbol{u}_t \mid \boldsymbol{y}_{1:T})intractable (nonlinear)
SimulatorFilteringp(uty1:t)p(\boldsymbol{u}_t \mid \boldsymbol{y}_{1:t})Kalman / EnKF / Particle
SimulatorSmoothingp(uty1:T)p(\boldsymbol{u}_t \mid \boldsymbol{y}_{1:T})RTS / Particle smoother
SimulatorVIq(u1:Tψ)q(\boldsymbol{u}_{1:T} \mid \boldsymbol{\psi})factored or structured ELBO
SimulatorVI + paramsq(u1:T,θψ)q(\boldsymbol{u}_{1:T}, \boldsymbol{\theta} \mid \boldsymbol{\psi})hierarchical ELBO
SimulatorAmortized filterq(uty1:t,ψ)q(\boldsymbol{u}_t \mid \boldsymbol{y}_{1:t}, \boldsymbol{\psi})recurrent network
SimulatorAmortized smoothq(uty1:T,ψ)q(\boldsymbol{u}_t \mid \boldsymbol{y}_{1:T}, \boldsymbol{\psi})bidirectional encoder
SimulatorAmortized jointq(u1:T,θy,ψ)q(\boldsymbol{u}_{1:T}, \boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{\psi})hierarchical amortized ELBO
EmulatorTrainingq(ztut,ψem)q(\boldsymbol{z}_t \mid \boldsymbol{u}_t, \boldsymbol{\psi}_{\mathrm{em}})sequential VAE ELBO
EmulatorExactp(zty1:t)p(\boldsymbol{z}_t \mid \boldsymbol{y}_{1:t})intractable
EmulatorExactp(uty1:T)p(\boldsymbol{u}_t \mid \boldsymbol{y}_{1:T})intractable
EmulatorFiltering z\boldsymbol{z}p(zty1:t)p(\boldsymbol{z}_t \mid \boldsymbol{y}_{1:t})Kalman / EnKF / Particle in z\boldsymbol{z}
EmulatorFiltering u\boldsymbol{u}p(uty1:t)p(\boldsymbol{u}_t \mid \boldsymbol{y}_{1:t})decode from z\boldsymbol{z} filter
EmulatorSmoothing z\boldsymbol{z}p(zty1:T)p(\boldsymbol{z}_t \mid \boldsymbol{y}_{1:T})RTS / Particle smoother in z\boldsymbol{z}
EmulatorSmoothing u\boldsymbol{u}p(uty1:T)p(\boldsymbol{u}_t \mid \boldsymbol{y}_{1:T})decode from z\boldsymbol{z} smoother
EmulatorVIq(z1:Tψ)q(\boldsymbol{z}_{1:T} \mid \boldsymbol{\psi})structured ELBO in z\boldsymbol{z}
EmulatorVI + stateq(u1:T,z1:Tψ)q(\boldsymbol{u}_{1:T}, \boldsymbol{z}_{1:T} \mid \boldsymbol{\psi})hierarchical ELBO
EmulatorVI + paramsq(u1:T,z1:T,θψ)q(\boldsymbol{u}_{1:T}, \boldsymbol{z}_{1:T}, \boldsymbol{\theta} \mid \boldsymbol{\psi})full hierarchical ELBO
EmulatorAmortized filterq(zty1:t,ψ)q(\boldsymbol{z}_t \mid \boldsymbol{y}_{1:t}, \boldsymbol{\psi})recurrent network in z\boldsymbol{z}
EmulatorAmortized smoothq(zty1:T,ψ)q(\boldsymbol{z}_t \mid \boldsymbol{y}_{1:T}, \boldsymbol{\psi})bidirectional encoder in z\boldsymbol{z}
EmulatorAmortized jointq(u,z,θy,x,ψ)q(\boldsymbol{u}, \boldsymbol{z}, \boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{x}, \boldsymbol{\psi})hierarchical amortized ELBO