Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

3DVar — Three-Dimensional Variational Assimilation

CSIC
UCM
IGEO

3DVar extends Optimal Interpolation to nonlinear observation operators. When HH is linear the posterior is Gaussian and the analysis is a single matrix expression (the BLUE). When HH is nonlinear that closed form is lost, so instead of solving for the posterior exactly we compute its mode — the maximum a posteriori (MAP) estimate — by iteratively minimising a variational cost Lorenc, 1986Talagrand, 1997.

The “3D” denotes three spatial dimensions at a single time (T=0T = 0, no temporal evolution); the dynamical extension is 4DVar. As before, the code is library-agnostic — jaxtyping for shapes, einx for array ops, JAX autodiff for the tangent-linear/adjoint, and a general optimiser (optimistix) for the minimisation.

From the MAP to the Variational Cost

The MAP estimate maximises the posterior, equivalently minimising its negative log:

x=arg maxxp(xy)=arg minxJ(x),J(x)=12xxbB12+12yH(x)R12.\boldsymbol{x}^\star = \operatorname*{arg\,max}_{\boldsymbol{x}} \, p(\boldsymbol{x} \mid \boldsymbol{y}) = \operatorname*{arg\,min}_{\boldsymbol{x}} \, J(\boldsymbol{x}), \qquad J(\boldsymbol{x}) = \tfrac{1}{2} \| \boldsymbol{x} - \boldsymbol{x}_b \|^2_{\mathbf{B}^{-1}} + \tfrac{1}{2} \| \boldsymbol{y} - H(\boldsymbol{x}) \|^2_{\mathbf{R}^{-1}}.

The two terms trade off the background (prior) against the observations (via the observation model), each weighted by its inverse covariance.

Gradient and Hessian

Minimisation needs the gradient, and Gauss–Newton needs (an approximation of) the Hessian:

J(x)=B1(xxb)H(x)R1(yH(x)),\nabla J(\boldsymbol{x}) = \mathbf{B}^{-1}(\boldsymbol{x} - \boldsymbol{x}_b) - H'(\boldsymbol{x})^\top \mathbf{R}^{-1} (\boldsymbol{y} - H(\boldsymbol{x})),
J(x)B1+H(x)R1H(x),J''(\boldsymbol{x}) \approx \mathbf{B}^{-1} + H'(\boldsymbol{x})^\top \mathbf{R}^{-1} H'(\boldsymbol{x}),

where H(x)H'(\boldsymbol{x}) is the tangent-linear of the observation operator (its adjoint H(x)H'(\boldsymbol{x})^\top comes from autodiff — see the observation-model note).

The Gauss–Newton Inner Loop = Iterated BLUE

Each Gauss–Newton iteration solves the normal equations for an increment δx\delta\boldsymbol{x} and updates the iterate:

(B1+HkR1Hk)δx=J(xk),xk+1=xk+δx,\big( \mathbf{B}^{-1} + H_k'^\top \mathbf{R}^{-1} H_k' \big) \, \delta\boldsymbol{x} = -\nabla J(\boldsymbol{x}_k), \qquad \boldsymbol{x}_{k+1} = \boldsymbol{x}_k + \delta\boldsymbol{x},

where Hk=H(xk)H_k' = H'(\boldsymbol{x}_k). The increment is exactly the minimiser of the linearised cost about xk\boldsymbol{x}_k — a linear-Gaussian problem with innovation dk=yH(xk)\boldsymbol{d}_k = \boldsymbol{y} - H(\boldsymbol{x}_k). In other words, 3DVar is a sequence of BLUE analyses, each one re-linearising HH about the current iterate.

Each inner solve uses the same matrix-free machinery as OI — never forming B\mathbf{B} or the Hessian, only their action on vectors:

import jax
from jax.scipy.sparse.linalg import cg
from jaxtyping import Array, Float

def gauss_newton_step(
    xk: Float[Array, "N"],
    xb: Float[Array, "N"],
    y:  Float[Array, "M"],
    H,             # nonlinear observation operator  x -> H(x)
    apply_B_inv,   # v -> B⁻¹ v
    apply_R_inv,   # v -> R⁻¹ v
) -> Float[Array, "N"]:
    Hk, Hk_lin = jax.linearize(H, xk)              # Hk = H(xk);  Hk_lin(v) = H'_k v
    Hk_adj     = jax.linear_transpose(Hk_lin, xk)  # (H'_k)ᵀ

    grad = apply_B_inv(xk - xb) - Hk_adj(apply_R_inv(y - Hk))[0]   # ∇J(xk)
    def hess(v: Float[Array, "N"]) -> Float[Array, "N"]:          # B⁻¹ + H'_kᵀ R⁻¹ H'_k
        return apply_B_inv(v) + Hk_adj(apply_R_inv(Hk_lin(v)))[0]

    dx, _ = cg(hess, -grad)                                       # solve normal equations
    return xk + dx

Preconditioning: the control-variable transform

The Hessian (4) contains B1\mathbf{B}^{-1}, which is badly conditioned for smooth priors. Operational 3DVar minimises in a whitened control variable χ\boldsymbol{\chi} defined by x=xb+B1/2χ\boldsymbol{x} = \boldsymbol{x}_b + \mathbf{B}^{1/2}\boldsymbol{\chi} Bannister, 2008, which turns the background term into an identity and preconditions the problem:

J(χ)=12χ2+12yH(xb+B1/2χ)R12.J(\boldsymbol{\chi}) = \tfrac{1}{2}\|\boldsymbol{\chi}\|^2 + \tfrac{1}{2}\|\boldsymbol{y} - H(\boldsymbol{x}_b + \mathbf{B}^{1/2}\boldsymbol{\chi})\|^2_{\mathbf{R}^{-1}}.

The background Hessian contribution becomes I\mathbf{I}, so the conjugate- gradient inner loop converges in far fewer iterations.

Implementation

The Gauss–Newton form is a nonlinear least-squares problem: stack the whitened residuals so that 12r(x)2=J(x)\tfrac{1}{2}\|\boldsymbol{r}(\boldsymbol{x})\|^2 = J(\boldsymbol{x}), and hand it to a least-squares solver.

import optimistix as optx
import jax.numpy as jnp
from jaxtyping import Array, Float

def threedvar(
    xb, y, H, whiten_B, whiten_R,            # whiten_B: v -> B^{-1/2} v ;  whiten_R: v -> R^{-1/2} v
    *, solver=optx.GaussNewton(rtol=1e-8, atol=1e-8),
):
    def residual(x: Float[Array, "N"], args) -> Float[Array, "N+M"]:
        return jnp.concatenate([whiten_B(x - xb), whiten_R(y - H(x))])
    sol = optx.least_squares(residual, solver, xb)    # ½‖r‖² = J(x); warm-start at the background
    return sol.value                                  # MAP estimate x*

Table 1:Choosing a minimiser by cost-function geometry.

MinimiserUse when
GaussNewtoncost is approximately quadratic (least-squares) — the default
LevenbergMarquardthighly nonlinear HH / Gauss–Newton overshoots
BFGSgeneral-purpose, no least-squares structure assumed
NonlinearCGmemory-constrained problems

Convergence is typically tens of iterations for well-conditioned problems. To differentiate through the optimum (e.g. to learn B\mathbf{B} or R\mathbf{R}), use an implicit adjoint (optx.ImplicitAdjoint): the implicit function theorem gives the gradient at the solution in O(1)O(1) memory, without unrolling the optimiser.

Posterior via the Laplace Approximation

For a Gaussian likelihood near the MAP, a second-order (Laplace) expansion of JJ gives a Gaussian posterior whose covariance is the inverse Gauss–Newton Hessian:

p(xy)N(x;x,P),P=(B1+H(x)R1H(x))1=(J(x))1.p(\boldsymbol{x} \mid \boldsymbol{y}) \approx \mathcal{N}(\boldsymbol{x}; \boldsymbol{x}^\star, \mathbf{P}^\star), \qquad \mathbf{P}^\star = \big( \mathbf{B}^{-1} + H'(\boldsymbol{x}^\star)^\top \mathbf{R}^{-1} H'(\boldsymbol{x}^\star) \big)^{-1} = \big( J''(\boldsymbol{x}^\star) \big)^{-1}.

This is exactly the OI posterior covariance with HH linearised at the MAP. Sampling and marginal variances are matvecs with (P)1/2(\mathbf{P}^\star)^{1/2} — nothing is materialised. For strongly nonlinear operators the Laplace approximation under-reports uncertainty; reach for an ensemble covariance or an amortized posterior for better calibration.

When 3DVar Is Appropriate

Use 3DVar when…
Reach for something else when…
  • HH is nonlinear (the reason to go beyond OI).

  • A single timestep — no dynamics.

  • The posterior is unimodal, well-approximated as Gaussian around the MAP.

  • B\mathbf{B} and R\mathbf{R} are Gaussian.

Typical cases: snapshot inversions (a single satellite overpass), static-field estimation, multi-instrument fusion at one time, single-overpass methane source attribution.

References
  1. Lorenc, A. C. (1986). Analysis Methods for Numerical Weather Prediction. Quarterly Journal of the Royal Meteorological Society, 112(474), 1177–1194. 10.1002/qj.49711247414
  2. Talagrand, O. (1997). Assimilation of Observations, an Introduction. Journal of the Meteorological Society of Japan, 75(1B), 191–209. 10.2151/jmsj1965.75.1B_191
  3. Bannister, R. N. (2008). A Review of Forecast Error Covariance Statistics in Atmospheric Variational Data Assimilation. I: Characteristics and Measurements of Forecast Error Covariances. Quarterly Journal of the Royal Meteorological Society, 134(637), 1951–1970. 10.1002/qj.339