18 · Latent Variational Data Assimilation¶
This chapter formalises variational data assimilation in a learned low-dimensional latent space. It complements chapters 5–8 (3DVar, strong-/weak-4DVar, incremental 4DVar) and chapter 9 (4DVarNet) by specifying three latent-space peers: latent 3DVar, latent strong-4DVar, and latent hybrid 4DVar.
Design anchor: design/latent_da.md, Decision D17.
18.1 Setting and notation¶
Augment the notation of chapter 1 with an autoencoder \((\varphi, \psi)\):
| Symbol | Space | Meaning |
|---|---|---|
| \(x\) | \(\mathbb{R}^{N_x}\) | physical state |
| \(z\) | \(\mathbb{R}^{N_z}\), \(N_z \ll N_x\) | latent code |
| \(\varphi: x \mapsto z\) | encoder | satisfies pipekit_cycle.Encoder |
| \(\psi: z \mapsto \hat{x}\) | decoder | satisfies pipekit_cycle.Decoder |
| \(M_x: x \mapsto x\) | physical forward model | as in chapter 3 |
| \(M_z: z \mapsto z\) | latent forward model | learned or \(\varphi \circ M_x \circ \psi\) |
| \(H: x \mapsto y\) | observation operator | as in chapter 2 |
| \(\tilde{H} := H \circ \psi\) | lifted obs operator | \(z \mapsto y\) |
| \(\mathbf{B}_z\) | \(\mathbb{R}^{N_z \times N_z}\) | background-error covariance in \(z\) |
| \(\mathbf{R}\) | \(\mathbb{R}^{N_y \times N_y}\) | observation-error covariance |
The pair \((\varphi, \psi)\) is consistent on the data manifold if \(\psi(\varphi(x)) \approx x\) for \(x\) drawn from the climatology. The formalism does not require exact identity — it requires only that \(\psi\) be differentiable, which is automatic for neural decoders.
18.2 Latent 3DVar¶
18.2.1 Cost¶
The latent counterpart of chapter 5's 3DVar replaces the control vector \(x\) with \(z\) and the observation operator \(H\) with \(\tilde{H}\):
18.2.2 Gradient and Hessian¶
Let \(\tilde{H}'(z) := H'(\psi(z))\,\psi'(z)\) be the lifted tangent linear (chain rule). Then
and the Gauss–Newton Hessian is
When \(\tilde{H}'\) is treated as constant near the linearisation point (the incremental form, cf. chapter 8), the minimisation reduces to a linear solve and the Sherman–Morrison– Woodbury identity gives the closed form
The \((N_y \times N_y)\) inverse matches the OI form (chapter 4); the difference is that the gain is now multiplied by \(\mathbf{B}_z\) (size \(N_z \times N_z\)) rather than \(\mathbf{B}\) (size \(N_x \times N_x\)). This is the structural source of the order-of-magnitude wall-clock win when \(N_z \ll N_x\).
18.2.3 Analysis output¶
The variational solution is in \(z\); the analysis in physical space is \(x^* = \psi(z^*)\). The posterior covariance, when needed, comes from the Laplace approximation at \(z^*\):
The pushforward \(\mathrm{cov}_x\) has rank at most \(N_z\) — a low-rank
object naturally represented by lineax.LowRankUpdate.
18.3 Latent strong-constraint 4DVar¶
18.3.1 Cost¶
Control vector \(z_0\); rollout \(z_{k+1} = M_z(z_k)\) in the latent space; observation residuals via the lifted operator:
18.3.2 Gradient¶
Backpropagation through the latent rollout yields, for each observation window \(k\),
For learned \(M_z\) implemented as an eqx.Module, the tangent-linear
operators come from JAX autodiff; no hand-coded adjoint is required.
18.3.3 Memory bound¶
The full unrolled adjoint stores \(O(K \cdot N_z)\) values rather than \(O(K \cdot N_x)\) — another order-of-magnitude saving for large \(N_x\). With \(K = 20\), \(N_z = 8\) versus \(N_x = 10^4\), the backward tape shrinks from megabytes to kilobytes.
18.4 Latent hybrid 4DVar¶
18.4.1 Motivation¶
The strong-latent form (§18.3) commits the forecast to the latent manifold, which can accumulate reconstruction error over long windows. The hybrid form keeps the forecast in physical space — where the trusted physics model lives — and uses \(z\) only as the control variable and as the natural background space:
- Control: \(z_0 \in \mathcal{Z}\).
- Initial physical state: \(x_0 = \psi(z_0)\).
- Rollout: \(x_{k+1} = M_x(x_k)\) in \(\mathcal{X}\) (physics, not AE).
- Observation residual: \(y_k - H(x_k)\), no \(\psi\) inside the sum.
18.4.2 Cost¶
18.4.3 Gradient¶
The decoder Jacobian \(\psi'(z_0)\) appears only once, at the start of the window. The expensive piece is the physics adjoint \(\prod M_x'\), identical to chapter 6.
18.4.4 When to prefer hybrid over strong-latent¶
| Scenario | Recommended |
|---|---|
| AE reconstructs the manifold well over the assimilation window | strong-latent (§18.3) — cheapest |
| Physics model is trusted; AE only approximate | hybrid (§18.4) |
No learned \(M_z\) available, and EncodedForwardModel is too lossy |
hybrid |
| \(N_x\) huge (\(10^6\)+) and physics adjoint is the bottleneck | strong-latent (forces \(M_z\) training) |
18.5 Latent incremental 4DVar (sketch)¶
The incremental algorithm of chapter 8 extends mutatis mutandis: outer Gauss–Newton loop produces a sequence of inner problems
with \(d_k = y_k - \tilde{H}(z_k^{(n)})\) the innovation at outer iteration \(n\) and primed quantities evaluated at the current outer guess. The inner problem is linear-quadratic in \(\delta z\) and CG-able in \(\mathcal{Z}\) — typically two orders of magnitude fewer CG iterations than in \(\mathcal{X}\) because the conditioning improves with dimensionality reduction.
A control-variable transform (CVT) in \(z\) is rarely needed: if the AE was trained with a unit-Gaussian prior on \(z\), then \(\mathbf{B}_z = I\) to leading order and the cost is already well-conditioned.
18.6 Identity-AE consistency¶
If \((\varphi, \psi) = (\mathrm{id}, \mathrm{id})\) then \(z = x\),
\(\tilde{H} = H\), \(M_z = M_x\), \(\mathbf{B}_z = \mathbf{B}\), and the
three latent costs reduce exactly to the corresponding x-space costs.
This is the canonical regression test for any implementation
(vardax.identity_latent_map(N)).
18.7 Connection to chapter 9 (4DVarNet)¶
FourDVarNet minimises
This is prior-only latent: the AE appears as a reconstruction
penalty but the control remains \(x\). The latent methods of this
chapter promote \(z\) itself to the control vector. The two are not
incompatible — FourDVarNet retains its place when the autoencoder is
a regulariser rather than a parameterisation of the state space.
18.8 Worked example — Lorenz-96, \(N_x=40\), \(N_z=8\)¶
The notebook docs/notebooks/18_latent_lorenz.ipynb (to be added with
the implementation) trains a bilinear AE on L96 trajectories and runs
all three latent methods against the x-space baselines on the same
fixture. Expected pattern:
LatentThreeDVarmatchesThreeDVaranalysis RMSE to within \(0.05\) standard deviations; cost-per-cycle drops by \(\sim 25\times\).LatentStrongFourDVarwithEncodedForwardModel(no learned \(M_z\)) matchesStrongFourDVar; cost-per-cycle drops by \(\sim 10\times\) (the AE round-trip eats some of the win).LatentStrongFourDVarwith a trained residual-MLP \(M_z\) wins \(\sim 50\times\) but introduces a small RMSE bias controlled by AE reconstruction quality.LatentHybridFourDVarmatchesStrongFourDVarRMSE almost exactly; cost-per-cycle drops by \(\sim 4\times\) (dominated by physics rollout which is unchanged).
18.9 References¶
- Peyron, M., Fillion, A., Gürol, S., Marchais, V., Gratton, S., Boudier, P. & Goret, G. (2021). Latent space data assimilation by using deep learning. QJRMS, 147(740), 3759–3777.
- Cheng, S., Quilodrán-Casas, C., Ouala, S. et al. (2023). Generalised latent assimilation in heterogeneous reduced spaces with machine learning surrogates. J. Sci. Comput., 94, 11.
- Fablet, R., Chapron, B., Drumetz, L., Mémin, E., Pannekoucke, O. & Rousseau, F. (2021). Learning variational data assimilation models and solvers. JAMES, 13(10).
- Brajard, J., Carrassi, A., Bocquet, M. & Bertino, L. (2021). Combining data assimilation and machine learning to infer unresolved scale parametrization. Phil. Trans. R. Soc. A, 379(2194).
- Lguensat, R., Tandeo, P., Ailliot, P., Pulido, M. & Fablet, R. (2017). The analog data assimilation. MWR, 145(10).