Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Variational SSH reconstruction — 3D-Var, 4D-Var, DYMOST, BFN-QG, 4DVarNet

Variational reconstruction with dynamical priors

The pipeline in 0003 is a pathwise Gaussian process with a static, isotropic prior covariance encoded by a kernel. That works because SSH variability has a known length scale and the inverse problem is mostly one of interpolation — fill in the gaps between satellite tracks under a smoothness assumption. It stops working when the dominant signal is advection by mesoscale eddies: the field at t+τt+\tau is not a smoothed version of the field at tτt-\tau, it is a transported version of it. A static kernel cannot represent that.

This is where data assimilation (DA) takes over. The DA literature has spent fifty years on exactly this problem — combining sparse observations with a dynamical model to estimate an evolving state — and the SSH community has built a tower of methods on it: classical 3D-Var (≈ what we just did), 4D-Var (the time-evolving version), DYMOST (4D-Var with a quasi-geostrophic propagator), BFN-QG (a forward/backward nudging shortcut around the adjoint), and 4DVarNet (a learned variational scheme). All of these replace the GP prior with something that knows about ocean physics, so they live outside the gaussx pathwise stack — but they are the right comparison targets, and one of them is likely the operational successor.

This note frames the math, situates each method in the same picture, and gives concrete pointers to the local code that implements them ([jej_vc_snippets/quasigeostrophic_model/massh/](file:///home/azureuser/localfiles/jej_vc_snippets/quasigeostrophic_model/massh/), [jej_vc_snippets/4dvar/](file:///home/azureuser/localfiles/jej_vc_snippets/4dvar/)) and to the existing 3D-Var derivation in the plume-simulation project[1]. The aim is navigational — you should be able to read this side by side with the SSH-mapping data-challenge papers and see exactly which equation each algorithm is solving.


1. Variational data assimilation in one paragraph

The analysis state x^RN\hat x \in \mathbb{R}^N is the MAP estimate under a Gaussian prior xN(xb,B)x \sim \mathcal{N}(x_b, B) and a Gaussian likelihood yxN(H(x),R)y \mid x \sim \mathcal{N}(\mathcal{H}(x), R):

x^  =  argminx  12(xxb)B1(xxb)Jb(x) — background term  +  12(yH(x))R1(yH(x))Jo(x) — observation term.\hat x \;=\; \arg\min_{x}\; \underbrace{\tfrac{1}{2}\,(x - x_b)^\top B^{-1} (x - x_b)}_{\mathcal{J}_{b}(x)\ \text{— background term}} \;+\; \underbrace{\tfrac{1}{2}\,\bigl(y - \mathcal{H}(x)\bigr)^\top R^{-1} \bigl(y - \mathcal{H}(x)\bigr)}_{\mathcal{J}_{o}(x)\ \text{— observation term}}.

The only dependence on the analysis problem at hand is in the choice of (xb,B,H,R)(x_b,\, B,\, \mathcal{H},\, R). Everything below is a pattern of choices.

QuantityStatic-GP / 3D-Var4D-VarDYMOSTBFN-QG4DVarNet
State xxSSH field at ttSSH field at t0t_0, propagated by a modelSSH field along Lagrangian particlesTrajectory across windowSSH field across window
Prior mean xbx_b0 (anomaly) or DUACS0 or previous analysismodel trajectorymodel trajectory0 or previous analysis
Prior cov BBMatérn × OU kernelBB at t0t_0, propagated by model TLMstatic, in propagated frameimplicit (nudging coefficient)learned \\Phi_\\theta
Forward H\mathcal{H}linear (sample at obs locations)HMt\mathcal{H} \circ M^t — model-then-sampleHMQG\mathcal{H} \circ M_{\text{QG}}HMQG\mathcal{H} \circ M_{\text{QG}}H\mathcal{H} (sample)
Solverdense / CG / Woodburyadjoint + LBFGSadjoint + LBFGSforward/backward nudging — no adjointlearned ConvLSTM grad. modulator
Codegaussx 00/01/03massh inversion.pymassh model.py + inversion.pymassh model.py (BFN loop)fourdvarnet/

The rest of the note works through each row.


2. 3D-Var ≡ static-GP optimal interpolation (sanity check)

The pathwise-GP pipeline from 00–03 is 3D-Var. Setting

the 3D-Var cost eq-3dvar becomes

J(η)  =  12ηKXX1η  +  12σobs2(yHη)(yHη).\mathcal{J}(\eta) \;=\; \tfrac{1}{2}\,\eta^\top K_{\mathcal{X}^*\mathcal{X}^*}^{-1}\, \eta \;+\; \tfrac{1}{2\sigma_{obs}^2}\,\bigl(y - \mathcal{H}\eta\bigr)^\top \bigl(y - \mathcal{H}\eta\bigr).

Setting J=0\nabla \mathcal{J} = 0 yields, after the matrix-inversion lemma and a little reorganisation, the GP posterior mean from 00:

η^  =  KXX(KXX+σobs2I)1y  =  μfy.\hat\eta \;=\; K_{\mathcal{X}^*\mathcal{X}}\,(K_{\mathcal{X}\mathcal{X}} + \sigma_{obs}^2 I)^{-1}\, y \;=\; \mu_{f|y}.

This is the dual form of 3D-Var (PSAS) and matches the GP analytical posterior exactly — it is one of those “the same algorithm with two names” results that recurs in DA every five years[2] [3]. The static-GP pipeline you already have is already 3D-Var. Calling it “GP” or “OI” or “3D-Var” only changes which jargon you use to describe its components; the equations are identical.

The reason the GP framing wins for the static case is that the kernel encodes BB implicitly via its closed-form covariance function — you never have to store BRn×nB \in \mathbb{R}^{n \times n}, you just call gx.ImplicitKernelOperator. The DA literature spent decades wrestling with structured representations of BB (digital filters[4], NMC method[5], wavelet BB[6]); GP kernels solve it for free.


3. 4D-Var — propagating the analysis through time

3D-Var assumes the field is approximately stationary over the analysis window. When τueddy\tau \cdot u_{\text{eddy}} approaches a deformation radius — i.e. an eddy moves by its own diameter during the temporal window — that assumption fails. 4D-Var fixes it by adding a deterministic dynamical model MM to the cost:

x^0  =  argminx0  12(x0xb)B1(x0xb)  +  12i=0T(yiH(Mix0))Ri1(yiH(Mix0)).\hat x_0 \;=\; \arg\min_{x_0}\; \tfrac{1}{2}\,(x_0 - x_b)^\top B^{-1} (x_0 - x_b) \;+\; \tfrac{1}{2}\sum_{i=0}^{T} \bigl(y_i - \mathcal{H}(M^i x_0)\bigr)^\top R_i^{-1} \bigl(y_i - \mathcal{H}(M^i x_0)\bigr).

The control variable is the initial state x0x_0. The model MM propagates it through the window; observations at each time tit_i are compared to H(Mix0)\mathcal{H}(M^i x_0). Two things change vs. 3D-Var:

The minimisation is iterative — typically incremental 4D-Var: linearise MM and H\mathcal{H} around the current trajectory (the “outer loop”), solve the resulting quadratic problem (the “inner loop”) with PCG-LBFGS, update the trajectory, repeat[2]. The control-variable transform from `plume_simulation/.../00_3dvar_derivation.md` §2` carries over directly: pre-conditioning by UU with B=UUB = U U^\top collapses the prior to a sphere and makes LBFGS converge in ~10 iterations.

In code. [jej_vc_snippets/4dvar/src/fourdvarnet/costs.py](file:///home/azureuser/localfiles/jej_vc_snippets/4dvar/src/fourdvarnet/costs.py) builds eq-4dvar for the (B, T, H, W) 2-D layout; the gradient is taken via jax.grad so the adjoint is implicit. solver.py runs the inner-loop LBFGS.

The catch with 4D-Var is the model MM. For SSH, the relevant model is some flavour of quasi-geostrophic dynamics — the natural mid-latitude approximation when the Rossby number is small.


4. The 1.5-layer QG model — the workhorse propagator for SSH

The simplest model that captures mesoscale eddy advection is the 1.5-layer quasi-geostrophic equation: an active surface layer over a deep, motionless layer. The SSH η acts as a streamfunction (up to the Coriolis factor g/fg/f); potential vorticity qq is materially conserved:

q  =  2ψ    1LR2ψ,ψ=gη/f,q \;=\; \nabla^2 \psi \;-\; \frac{1}{L_R^2}\,\psi, \qquad \psi = g \eta / f,
tq  +  J(ψ,q)  =  0,J(ψ,q)=xψyqyψxq.\partial_t q \;+\; J(\psi, q) \;=\; 0, \qquad J(\psi, q) = \partial_x \psi \cdot \partial_y q - \partial_y \psi \cdot \partial_x q.

LRL_R is the first baroclinic Rossby radius (~50 km mid-latitude, ~10 km at high latitudes). The Jacobian J(ψ,q)J(\psi, q) is the eddy advection operator — this is what the static GP cannot express. Numerically: solve the elliptic eq-pv with a Helmholtz solver to get ψ from qq, advect qq along the velocity (yψ,xψ)(-\partial_y \psi, \partial_x \psi) with an upwind or Arakawa scheme.

In code. The user’s local [jej_vc_snippets/quasigeostrophic_model/mqg/qgm.py](file:///home/azureuser/localfiles/jej_vc_snippets/quasigeostrophic_model/mqg/qgm.py) and [mqg/helmholtz.py](file:///home/azureuser/localfiles/jej_vc_snippets/quasigeostrophic_model/mqg/helmholtz.py) implement the elliptic solve and Arakawa-Jacobian advection in JAX (differentiable). [mqg/example_natl.py](file:///home/azureuser/localfiles/jej_vc_snippets/quasigeostrophic_model/mqg/example_natl.py) runs it on a North Atlantic configuration. This is the same numerical core as the MASSH propagator, just packaged separately.


5. DYMOST — 4D-Var in particle-following coordinates

DYMOST (Ubelmann, Klein, Fu)[7] [8] is a 4D-Var with the 1.5-layer QG model as MM, but with a clever simplification: instead of running a full incremental-4D-Var loop, propagate the analysed SSH backward and forward along eq-qg and perform OI in a Lagrangian, particle-following frame. In that frame the field looks closer to stationary — the QG flow has done most of the work of “straightening out” the eddies — so a single 3D-Var step on the propagated residual gives a near-optimal answer.

Schematically:

For each obs time t_i in the window:
  η_propagated_i  =  M^{i - t_target}  η_target            # propagate analysis to obs time
  innovation_i    =  y_i - H(η_propagated_i)               # residual in observation space
Combine innovations with a static OI to update η_target.

The whole window thus collapses to one 3D-Var solve (the static-GP machinery from 00/01) on the QG-propagated residuals. Cost: ~1× a QG forward + 1× backward integration over the window, plus one GP solve. No adjoint, because the OI step is linear and the QG propagation only enters as a forward operator.

Performance: 30–60% more SSH variance recovered in the western boundary currents (Gulf Stream, Kuroshio) over static OI[8] — exactly where eddy advection dominates.


6. BFN-QG — the adjoint-free 4D-Var

Even DYMOST simplifies the adjoint problem; full 4D-Var still needs MM^\top. Le Guillou et al.[9] [10] propose back-and-forth nudging: integrate the QG model forward over the window with a Newtonian relaxation toward observations, then integrate backward with the same relaxation, iterate until both passes agree. The relaxation replaces the gradient descent on the cost — no adjoint, no covariance, no LBFGS.

Forward pass:

tψ  =  J(ψ,q)    κ1obs(Hψy), \partial_t \psi \;=\; -J(\psi, q) \;-\; \kappa \cdot \mathbb{1}_{\text{obs}} \cdot (\mathcal{H}\psi - y),

where κ is the nudging coefficient (the only hyperparameter) and 1obs\mathbb{1}_{\text{obs}} is the indicator at observation locations + times. Backward pass: same equation integrated from t0+Tt_0 + T down to t0t_0 with reversed time-derivative sign.

Iterate the forward/backward pair 5\sim 510 times. The final analysis is the forward trajectory at the iteration where forward/backward states agree to within a tolerance.

Pros: no adjoint, no B1B^{-1}, embarrassingly simple to code (three loops). Cons: the implicit error covariance is whatever fixed point the nudging settles to — no posterior uncertainty out of the box, no principled way to tune κ except by cross-validation. Despite that, BFN-QG matches incremental 4D-Var on the SWOT data challenges[10] in energetic regions, making it the highest-bang-per-buck dynamical method on the menu.

In code. [jej_vc_snippets/quasigeostrophic_model/massh/](file:///home/azureuser/localfiles/jej_vc_snippets/quasigeostrophic_model/massh/) is the reference open implementation by Le Guillou. massh/model.py runs the QG dynamics; massh/4dvar.py and massh/inversion.py provide both the BFN loop and a true incremental 4D-Var path. Upstream is github.com/leguillf/MASSH.


7. 4DVarNet — learn the prior, learn the solver

The catch with all of §3–§6 is that the QG model is only a model — it captures geostrophic eddy advection, but misses internal tides, ageostrophic motions, submesoscale frontogenesis, and a long tail of finer physics. Hand-extending QG with each missing piece is open-ended. 4DVarNet (Fablet, Beauchamp et al.)[11] [12] takes a different route: keep the variational form of the cost, but replace the hand-crafted prior with a learned regulariser, and replace the gradient solver itself with a learned one.

Cost:

x^  =  argminx  HxyR12Jo— observation  +  xΦθ(x)2Jp— learned prior. \hat x \;=\; \arg\min_{x}\; \underbrace{\bigl\|\mathcal{H} x - y\bigr\|^2_{R^{-1}}}_{\mathcal{J}_o\,\text{— observation}} \;+\; \underbrace{\bigl\|x - \Phi_\theta(x)\bigr\|^2}_{\mathcal{J}_p\,\text{— learned prior}}.

(eq-4dvarnet-cost)

Φθ\Phi_\theta is a neural autoencoder (typically a bilinear convolutional AE on the spatiotemporal patch) trained as a fixed point on plausible SSH fields: Φθ(xplausible)xplausible\Phi_\theta(x_{\text{plausible}}) \approx x_{\text{plausible}}. The prior cost xΦθ(x)2\|x - \Phi_\theta(x)\|^2 is then small whenever xx is in-distribution and large otherwise — a learned characterisation of “plausible SSH.”

Learned solver:

xk+1  =  xk    αkΨω(J(xk))    βkJ(xk), x_{k+1} \;=\; x_k \;-\; \alpha_k \cdot \Psi_\omega\bigl(\nabla \mathcal{J}(x_k)\bigr) \;-\; \beta_k \cdot \nabla \mathcal{J}(x_k),

(eq-4dvarnet-solver)

with Ψω\Psi_\omega a ConvLSTM “gradient modulator” that learns a state-dependent preconditioner from training data. Training is end-to-end: backpropagate through KK unrolled solver iterations to update (θ,ω)(\theta, \omega), with implicit differentiation for memory efficiency.

In code. [jej_vc_snippets/4dvar/src/fourdvarnet/](file:///home/azureuser/localfiles/jej_vc_snippets/4dvar/src/fourdvarnet/): priors.py (autoencoder Φθ\Phi_\theta), grad_mod.py (ConvLSTM Ψω\Psi_\omega), solver.py (the unrolled solver eq-4dvarnet-solver), model.py (FourDVarNet1D/2D putting it all together), training.py (end-to-end fit). The full math reference is in jej_vc_snippets/4dvar/docs/.

Performance: 4DVarNet trained on simulation (NATL60, eNATL60) and applied to real SWOT observations beats DUACS / MIOST / BFN-QG by another 5–15% RMSE on the open ocean[12], with no further tuning. The price is a training stage that requires a high-resolution simulation ground truth — which is fine for SSH (multiple GCMs available) but expensive to repeat for each new region.


8. Where this leaves the GP pathwise pipeline

These methods don’t replace the GP pipeline so much as sit alongside it at different points on the cost-vs-physics curve:

MethodCaptures eddy advection?Cost vs static GPPosterior uncertainty?Code
Static-GP 3D-Var (00/01/03)NoYes (pathwise samples)gaussx pathwise stack
DYMOSTYes (linearised)3–5×Yes (around the QG trajectory)mqg/, massh/
BFN-QGYes5–10×No (nudging fixed point)massh/
Incremental 4D-Var (full QG)Yes10–20×Yes (Hessian-based)massh/, 4dvar/
4DVarNetYes (learned)1–2× after training (training is one-shot)No (point estimate)4dvar/

Two practical hybrid recipes:

For the SSH-mapping benchmarks (Ocean Data Challenges, MEOM/IGE), the operational hierarchy as of 2025 is roughly: DUACS < static GP ≈ MIOST < DYMOST < BFN-QG ≈ incremental-4D-Var < 4DVarNet, with each step ~5–15% RMSE improvement and ~3–5× compute. The right choice for your project depends on which end of that curve you are willing to live on.


Footnotes
  1. A complementary 3D-Var derivation aimed at hyperspectral methane retrieval lives at projects/plume_simulation/notebooks/assimilation/00_3dvar_derivation.md — same math, different application. We borrow its preconditioning and dual-form discussion when relevant.

  2. Bouttier, F., Courtier, P. (1999). Data assimilation concepts and methods. Meteorological Training Course Lecture Series, ECMWF.

  3. Lorenc, A. C., Ballard, S. P., Bell, R. S., Ingleby, N. B., Andrews, P. L. F., Barker, D. M., Bray, J. R., Clayton, A. M., Dalby, T., Li, D., Payne, T. J., Saunders, F. W. (2000). “The Met. Office global three-dimensional variational data assimilation scheme.” Q. J. R. Meteorol. Soc. 126, 2991–3012.

  4. Purser, R. J., Wu, W.-S., Parrish, D. F., Roberts, N. M. (2003). “Numerical aspects of the application of recursive filters to variational statistical analysis. Part I: Spatially homogeneous and isotropic Gaussian covariances.” Mon. Weather Rev. 131, 1524–1535.

  5. Parrish, D. F., Derber, J. C. (1992). “The National Meteorological Center’s spectral statistical-interpolation analysis system.” Mon. Weather Rev. 120, 1747–1763.

  6. Berre, L., Pannekoucke, O., Desroziers, G., Stefanescu, S. E., Chapnik, B., Raynaud, L. (2010). “A variational assimilation ensemble and the spatial filtering of its error covariances: increase of sample size by local spatial averaging.” ECMWF Workshop on Diagnostics of Data Assimilation System Performance.

  7. Ubelmann, C., Klein, P., Fu, L.-L. (2015). “Dynamic interpolation of sea surface height and potential applications for future high-resolution altimetry mapping.” J. Atmos. Ocean. Technol. 32, 177–184.

  8. Ubelmann, C., Cornuelle, B., Fu, L.-L. (2020). “Dynamic mapping of along-track ocean altimetry: performance from real observations.” J. Atmos. Ocean. Technol. 37, 1691–1707.

  9. Le Guillou, F., Metref, S., Cosme, E., Ubelmann, C., Ballarotta, M., Le Sommer, J., Verron, J. (2021). “Mapping altimetry in the forthcoming SWOT era by back-and-forth nudging a one-layer quasi-geostrophic model.” J. Atmos. Ocean. Technol. 38, 697–710.

  10. Le Guillou, F., Metref, S., Cosme, E., Le Sommer, J., Ubelmann, C., Verron, J., Ballarotta, M. (2023). “Regional mapping of energetic short mesoscale ocean dynamics from altimetry.” Ocean Science 19, 1517–1530.

  11. Beauchamp, M., Febvre, Q., Georgenthum, H., Fablet, R. (2023). “4DVarNet-SSH: end-to-end learning of variational interpolation schemes for nadir and wide-swath satellite altimetry.” Geosci. Model Dev. 16, 2119–2147.

  12. Febvre, Q., Le Sommer, J., Ubelmann, C., Fablet, R. (2024). “Training neural mapping schemes for satellite altimetry with simulation data.” J. Adv. Model. Earth Syst. 16, e2023MS003959.