Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Strong-Constrained 4DVar

CSIC
UCM
IGEO

Strong-constraint 4DVar extends 3DVar from a single snapshot to a time window: observations yt\boldsymbol{y}_t at times t=0,1,,Tt = 0, 1, \ldots, T are related to a single initial state u0\boldsymbol{u}_0 through known dynamics Le Dimet & Talagrand, 1986Talagrand & Courtier, 1987. The label strong-constraint means the dynamics are treated as exact — any u0\boldsymbol{u}_0 deterministically produces the whole trajectory, and model error is excluded. Systems that need a model-error term use weak-constraint 4DVar instead.

The control variable is the initial state u0\boldsymbol{u}_0 (and optionally the parameters θ\boldsymbol{\theta}); the trajectory follows from the dynamical model.

Problem Formulation

Under the strong constraint the trajectory is a deterministic function of the initial state, ut=Mt(u0)\boldsymbol{u}_t = M_t(\boldsymbol{u}_0), where MtM_t propagates u0\boldsymbol{u}_0 from t0t_0 to tt. The dynamics enter the cost through the composition HtMtH_t \circ M_t — the observation operator applied to the propagated state. The generative model is

p(y0:T,u0,θ)=p(θ)p(u0θ)t=0Tp(ytMt(u0),θ),p(\boldsymbol{y}_{0:T}, \boldsymbol{u}_0, \boldsymbol{\theta}) = p(\boldsymbol{\theta}) \, p(\boldsymbol{u}_0 \mid \boldsymbol{\theta}) \prod_{t=0}^{T} p\big(\boldsymbol{y}_t \mid M_t(\boldsymbol{u}_0), \boldsymbol{\theta}\big),

with Gaussian components

Data likelihood:p(ytut,θ)=N(ytHt(ut),Rt)Prior state:p(u0θ)=N(u0ub,B)Prior parameters:p(θ)=N(θθb,Σθ).\begin{aligned} \text{Data likelihood}: && p(\boldsymbol{y}_t \mid \boldsymbol{u}_t, \boldsymbol{\theta}) &= \mathcal{N}(\boldsymbol{y}_t \mid H_t(\boldsymbol{u}_t), \mathbf{R}_t) \\ \text{Prior state}: && p(\boldsymbol{u}_0 \mid \boldsymbol{\theta}) &= \mathcal{N}(\boldsymbol{u}_0 \mid \boldsymbol{u}_b, \mathbf{B}) \\ \text{Prior parameters}: && p(\boldsymbol{\theta}) &= \mathcal{N}(\boldsymbol{\theta} \mid \boldsymbol{\theta}_b, \boldsymbol{\Sigma}_{\boldsymbol{\theta}}). \end{aligned}

The 4DVar Cost

The MAP estimate of u0\boldsymbol{u}_0 minimises the negative log-posterior — the 4DVar cost — a background term plus an observation term summed over the window:

J(u0)=12u0ubB12+12t=0TytHt(Mt(u0))Rt12.J(\boldsymbol{u}_0) = \tfrac{1}{2} \| \boldsymbol{u}_0 - \boldsymbol{u}_b \|^2_{\mathbf{B}^{-1}} + \tfrac{1}{2} \sum_{t=0}^{T} \| \boldsymbol{y}_t - H_t(M_t(\boldsymbol{u}_0)) \|^2_{\mathbf{R}_t^{-1}}.

Minimising requires the gradient, which by the chain rule pulls back through every MtM_t:

u0J(u0)=B1(u0ub)+t=0T(Mt)(Ht)Rt1(Ht(Mt(u0))yt).\nabla_{\boldsymbol{u}_0} J(\boldsymbol{u}_0) = \mathbf{B}^{-1}(\boldsymbol{u}_0 - \boldsymbol{u}_b) + \sum_{t=0}^{T} (M'_t)^\top (H'_t)^\top \mathbf{R}_t^{-1} \big( H_t(M_t(\boldsymbol{u}_0)) - \boldsymbol{y}_t \big).

The Adjoint Problem

Assembling (4) directly would require the dense Jacobians (Mt)(M'_t)^\top. Instead the gradient is computed by a single backward sweep of an adjoint variable λt\boldsymbol{\lambda}_t, initialised at the final time and accumulated as it propagates backward:

λT=(HT)RT1(HT(uT)yT),λt1=(Mt)λt+(Ht1)Rt11(Ht1(ut1)yt1),u0J=B1(u0ub)+λ0.\begin{aligned} \boldsymbol{\lambda}_T &= (H'_T)^\top \mathbf{R}_T^{-1} \big( H_T(\boldsymbol{u}_T) - \boldsymbol{y}_T \big), \\ \boldsymbol{\lambda}_{t-1} &= (M'_t)^\top \boldsymbol{\lambda}_t + (H'_{t-1})^\top \mathbf{R}_{t-1}^{-1} \big( H_{t-1}(\boldsymbol{u}_{t-1}) - \boldsymbol{y}_{t-1} \big), \\ \nabla_{\boldsymbol{u}_0} J &= \mathbf{B}^{-1}(\boldsymbol{u}_0 - \boldsymbol{u}_b) + \boldsymbol{\lambda}_0. \end{aligned}

Algorithm — Continuous-Time Adjoint

When the dynamics are a continuous flow tu=f(u,t,θ)\partial_t \boldsymbol{u} = \boldsymbol{f}(\boldsymbol{u}, t, \boldsymbol{\theta}), the flow map is an ODE solve, Mt(u0)=ODESolve(f,u0,θ,[t0,t])M_t(\boldsymbol{u}_0) = \mathrm{ODESolve}(\boldsymbol{f}, \boldsymbol{u}_0, \boldsymbol{\theta}, [t_0, t]), and the discrete adjoint sweep (5) becomes a backward ODE.

Data

D={yt}t=0T,ytRDy.\mathcal{D} = \{ \boldsymbol{y}_t \}_{t=0}^{T}, \qquad \boldsymbol{y}_t \in \mathbb{R}^{D_y}.

Dynamical and Adjoint Models

The forward dynamics and its adjoint (the transpose-Jacobian action), obtained for free from a VJP:

tu=f(u,t,θ),tλ=f(λ,t,u,y)=Jf(ut)λJH(ut)Rt1(H(ut)yt).\partial_t \boldsymbol{u} = \boldsymbol{f}(\boldsymbol{u}, t, \boldsymbol{\theta}), \qquad \partial_t \boldsymbol{\lambda} = \boldsymbol{f}^*(\boldsymbol{\lambda}, t, \boldsymbol{u}, \boldsymbol{y}) = \mathbf{J}_{\boldsymbol{f}}^\top(\boldsymbol{u}_t)\, \boldsymbol{\lambda} - \mathbf{J}_{H}^\top(\boldsymbol{u}_t)\, \mathbf{R}_t^{-1} \big( H(\boldsymbol{u}_t) - \boldsymbol{y}_t \big).
import jax
from typing import Callable

def init_f_adjoint(f: Callable) -> Callable:
    """Adjoint of the dynamics via VJP:  λ ↦ (∂f/∂u)ᵀ λ — no hand-coding."""
    def f_adj(u, lam):
        _, vjp = jax.vjp(f, u)
        return vjp(lam)[0]
    return f_adj

Initial Condition

We need an initial guess for the state and (optionally) the parameters:

u0RDu,θ0RDθ.\boldsymbol{u}_0 \in \mathbb{R}^{D_u}, \qquad \boldsymbol{\theta}_0 \in \mathbb{R}^{D_\theta}.

Integrate the Forward Model

Build the time grid that matches the observation times and roll the state forward:

import jax.numpy as jnp
from jaxtyping import Array, Float

dt: float = 0.01
time_steps: Float[Array, "Nt"] = jnp.arange(0, num_steps) * dt

state_sol = ode_solve(f, u0, theta, time_steps)     # forward rollout
U: Float[Array, "Nt Du"] = state_sol.u              # state trajectory

so that

U=ODESolve(f,u0,θ,T)=[u1,u2,,uT]RT×Du.\mathbf{U} = \mathrm{ODESolve}(\boldsymbol{f}, \boldsymbol{u}_0, \boldsymbol{\theta}, \mathbf{T}) = \left[ \boldsymbol{u}_1, \boldsymbol{u}_2, \ldots, \boldsymbol{u}_T \right] \in \mathbb{R}^{T \times D_u}.

Integrate the Backward Adjoint Model

Run the solver in reverse over the flipped time grid, accumulating the adjoint (7):

time_steps_reverse: Float[Array, "Nt"] = time_steps[::-1]
# λ_{t+1} = f*(u_t, λ_t, y, θ)  — backward sweep accumulating the obs misfit
λt+1=f(ut,λt,y,θ).\boldsymbol{\lambda}_{t+1} = \boldsymbol{f}^*(\boldsymbol{u}_t, \boldsymbol{\lambda}_t, \boldsymbol{y}, \boldsymbol{\theta}).

Gradient Steps

State
Parameters

The gradient w.r.t. the initial state, and a single B\mathbf{B}-preconditioned descent step:

u0J=B1(u0ub)+λ0,u0k+1=u0kαBu0J.\nabla_{\boldsymbol{u}_0} J = \mathbf{B}^{-1}(\boldsymbol{u}_0 - \boldsymbol{u}_b) + \boldsymbol{\lambda}_0, \qquad \boldsymbol{u}_0^{k+1} = \boldsymbol{u}_0^{k} - \alpha\, \mathbf{B}\, \nabla_{\boldsymbol{u}_0} J.

In modern code you rarely write the sweep by hand: define the cost (3), wrap the rollout in a differentiable ODE solve, and call jax.grad(J)(u0) — the dynamical-model adjoints compose the pullback automatically.

Computational Considerations

Long Assimilation Windows

For T100T \gtrsim 100 cycles, memory dominates. Pick a diffrax adjoint to trade memory against recompute:

Nonlinear Dynamics

When MtM_t is nonlinear the cost (3) is non-convex with multiple local minima. Mitigations: multi-start optimisation, warm-starting from the previous cycle, or switching to incremental 4DVar — which solves a sequence of linearised inner problems (each an OI/BLUE, exactly as in 3DVar’s Gauss–Newton loop) and is the more robust choice for production.

Posterior

A Laplace approximation gives a Gaussian posterior whose covariance is the inverse Gauss–Newton Hessian of (3), evaluated through the trajectory. Forming it exactly needs many Krylov iterations (tens of matrix–vector products, each a full forward+adjoint pass), so for production uncertainty quantification, incremental 4DVar or an ensemble / amortized posterior is cheaper.

When Strong-Constraint 4DVar Is Appropriate

Use it when…
Reach for something else when…
  • Observations span multiple times with known dynamics.

  • Model error sits within the observation noise floor (the strong constraint is reasonable).

  • You want a general-purpose 4DVar baseline.

References
  1. Le Dimet, F.-X., & Talagrand, O. (1986). Variational Algorithms for Analysis and Assimilation of Meteorological Observations: Theoretical Aspects. Tellus A, 38(2), 97–110. 10.3402/tellusa.v38i2.11706
  2. Talagrand, O., & Courtier, P. (1987). Variational Assimilation of Meteorological Observations with the Adjoint Vorticity Equation. I: Theory. Quarterly Journal of the Royal Meteorological Society, 113(478), 1311–1328. 10.1002/qj.49711347812
  3. Errico, R. M. (1997). What Is an Adjoint Model? Bulletin of the American Meteorological Society, 78(11), 2577–2591. https://doi.org/10.1175/1520-0477(1997)078<;2577:WIAAM>2.0.CO;2