Incremental 4DVar with Control-Variable Transform

Incremental 4DVar is the operational fast path for strong-constraint 4DVar. Rather than minimising the full nonlinear cost directly, it solves a sequence of linearised quadratic subproblems — a Gauss–Newton outer loop — each preconditioned by a control-variable transform Courtier et al., 1994Lorenc, 1997. It returns the same analysis as strong-constraint 4DVar, far faster.

The pattern is exactly 3DVar’s Gauss–Newton loop generalised to a time window: each inner problem is a linear-Gaussian (BLUE) analysis through the linearised dynamics.

The Outer / Inner Decomposition¶

Start from the strong-constraint cost. Linearising the dynamics and observations around the current outer iterate $\boldsymbol{u}^{(k)}$ ,

M_t(\boldsymbol{u}^{(k)} + \delta\boldsymbol{u}) \approx M_t(\boldsymbol{u}^{(k)}) + M'_t\, \delta\boldsymbol{u}, \qquad H_t(\boldsymbol{z} + \delta\boldsymbol{z}) \approx H_t(\boldsymbol{z}) + H'_t\, \delta\boldsymbol{z},

(1)

gives a quadratic increment cost in $\delta\boldsymbol{u}$ ,

\delta J(\delta\boldsymbol{u}) = \tfrac{1}{2} \| \delta\boldsymbol{u} \|^2_{\mathbf{B}^{-1}} + \tfrac{1}{2} \sum_{t=0}^{T} \| \boldsymbol{d}_t - H'_t M'_t\, \delta\boldsymbol{u} \|^2_{\mathbf{R}_t^{-1}}, \qquad \boldsymbol{d}_t = \boldsymbol{y}_t - H_t\!\big(M_t(\boldsymbol{u}^{(k)})\big),

(2)

where $\boldsymbol{d}_t$ is the innovation at the current iterate. The minimiser solves the Gauss–Newton normal equations

\mathbf{H}_{\text{GN}}\, \delta\boldsymbol{u}^\star = \sum_{t=0}^{T} (H'_t M'_t)^\top \mathbf{R}_t^{-1} \boldsymbol{d}_t, \qquad \mathbf{H}_{\text{GN}} = \mathbf{B}^{-1} + \sum_{t=0}^{T} (H'_t M'_t)^\top \mathbf{R}_t^{-1} (H'_t M'_t),

(3)

after which the outer update $\boldsymbol{u}^{(k+1)} = \boldsymbol{u}^{(k)} + \delta\boldsymbol{u}^\star$ relinearises and repeats.

Convergence is typically 3–5 outer iterations, largely independent of the strength of the nonlinearity.

Control-Variable Transform (CVT)¶

The condition number of $\mathbf{H}_{\text{GN}}$ inherits the eigenvalue spread of the prior $\mathbf{B}$ , which for a Matérn covariance on a 2D grid can span hundreds of orders of magnitude — crippling CG without preconditioning. The CVT changes variables to a whitened control,

\boldsymbol{\chi} = \mathbf{B}^{-1/2}\, \delta\boldsymbol{u}, \qquad \delta\boldsymbol{u} = \mathbf{B}^{1/2}\, \boldsymbol{\chi},

(4)

so the prior term becomes $\tfrac{1}{2}\|\delta\boldsymbol{u}\|^2_{\mathbf{B}^{-1}} = \tfrac{1}{2}\|\boldsymbol{\chi}\|^2$ and the Hessian becomes

\widetilde{\mathbf{H}}_{\text{GN}} = \mathbf{I} + \mathbf{B}^{1/2} \left( \sum_{t=0}^{T} (H'_t M'_t)^\top \mathbf{R}_t^{-1} (H'_t M'_t) \right) \mathbf{B}^{1/2}.

(5)

The square root $\mathbf{B}^{1/2}$ comes from a structured covariance operator (e.g. a Matérn .half()); for non-structured priors, factorise via Cholesky (small grids) or Lanczos (large grids).

Implementation¶

A generic outer/inner loop: linearise the windowed observation map with jax.linearize, then solve the CVT-preconditioned normal equations with CG.

import jax
from jax.scipy.sparse.linalg import cg
from jaxtyping import Array, Float

def incremental_4dvar(
    ub:  Float[Array, "N"],     # background / initial outer iterate
    ys:  Float[Array, "T M"],   # observations over the window
    G,                          # windowed obs map  u0 -> {H_t(M_t(u0))}  : (N,) -> (T, M)
    apply_R_inv,                # (T, M) -> (T, M), block-diagonal R⁻¹ over time
    B_half,                     # χ -> B^{1/2} χ   (structured square root, symmetric)
    *, n_outer: int = 3, n_inner: int = 20,
) -> Float[Array, "N"]:
    u = ub
    for _ in range(n_outer):
        # 1. linearise the windowed observation map at the current iterate
        Gu, G_lin = jax.linearize(G, u)             # Gu = G(u) (T,M);  G_lin(v) = G'(u) v
        G_adj     = jax.linear_transpose(G_lin, u)  # (G')ᵀ
        d         = ys - Gu                         # innovations (T, M)

        # 2. CVT-preconditioned Gauss–Newton solve in χ-space:
        #    (I + B^{1/2} G'ᵀ R⁻¹ G' B^{1/2}) χ = B^{1/2} G'ᵀ R⁻¹ d
        def hessian(chi: Float[Array, "N"]) -> Float[Array, "N"]:
            v = B_half(chi)
            return chi + B_half(G_adj(apply_R_inv(G_lin(v)))[0])
        rhs = B_half(G_adj(apply_R_inv(d))[0])
        chi, _ = cg(hessian, rhs, maxiter=n_inner)

        # 3. outer update:  δu = B^{1/2} χ
        u = u + B_half(chi)
    return u

Typical knobs: n_outer ≈ 3 (Gauss–Newton outer iterations), n_inner ≈ 20–50 (CG steps per outer), plus CG tolerances and a cvt on/off switch.

Why It Is the Operational Fast Path¶

Three multiplicative speedups over running a generic nonlinear minimiser on the full strong-constraint cost:

CVT preconditioning

Linearisation reuse

Free posterior

CG converges in $O(10)$ iterations instead of $O(\sqrt{\kappa(\mathbf{B})})$ — the prior condition number $\kappa(\mathbf{B})$ no longer drives the cost. See (5).

For a representative operational problem ( $T = 12$ h, $D_u = 10^7$ grid points, $D_y = 10^5$ observations), incremental 4DVar converges in minutes where a generic nonlinear solver on the same cost takes hours.

Limiting Case and Scope¶

Strong vs. weak constraint. Incremental 4DVar is strong-constraint by default. The strong/weak choice is structural (does the control include $\boldsymbol{\eta}_t$ ?); the inner-solver choice (generic vs. incremental) is performance. For weak-constraint problems the incremental machinery over the augmented control is heavier and not always worth it.

When to prefer strong-constraint 4DVar instead — almost never operationally, but:

the prior is exotic (no structured $\mathbf{B}^{1/2}$ ), so CVT preconditioning is unavailable;
you want a custom minimiser (e.g. Levenberg–Marquardt for very nonlinear cases) rather than the fixed Gauss–Newton + CG.

For most uses, incremental 4DVar is the right default Bannister, 2017.

References¶

Courtier, P., Thépaut, J.-N., & Hollingsworth, A. (1994). A Strategy for Operational Implementation of 4D-Var, Using an Incremental Approach. Quarterly Journal of the Royal Meteorological Society, 120(519), 1367–1387. 10.1002/qj.49712051912
Lorenc, A. C. (1997). Development of an Operational Variational Assimilation Scheme. Journal of the Meteorological Society of Japan, 75(1B), 339–346. 10.2151/jmsj1965.75.1B_339
Bannister, R. N. (2017). A Review of Operational Methods of Variational and Ensemble-Variational Data Assimilation. Quarterly Journal of the Royal Meteorological Society, 143(703), 607–633. 10.1002/qj.2982