3DVar extends Optimal Interpolation to nonlinear observation operators. When is linear the posterior is Gaussian and the analysis is a single matrix expression (the BLUE). When is nonlinear that closed form is lost, so instead of solving for the posterior exactly we compute its mode — the maximum a posteriori (MAP) estimate — by iteratively minimising a variational cost Lorenc, 1986Talagrand, 1997.
The “3D” denotes three spatial dimensions at a single time (, no temporal evolution); the dynamical extension is 4DVar. As before, the code is library-agnostic — jaxtyping for shapes, einx for array ops, JAX autodiff for the tangent-linear/adjoint, and a general optimiser (optimistix) for the minimisation.
From the MAP to the Variational Cost¶
The MAP estimate maximises the posterior, equivalently minimising its negative log:
The two terms trade off the background (prior) against the observations (via the observation model), each weighted by its inverse covariance.
Derivation — the cost is the negative log-posterior
With a Gaussian prior and a Gaussian likelihood , Bayes’ rule gives , so
The normalising constants do not depend on and drop out of the .
Gradient and Hessian¶
Minimisation needs the gradient, and Gauss–Newton needs (an approximation of) the Hessian:
where is the tangent-linear of the observation operator (its adjoint comes from autodiff — see the observation-model note).
Derivation — gradient and the Gauss–Newton Hessian
Differentiating the background term gives . For the observation term, the chain rule through (with Jacobian ) gives , which sum to (3). Differentiating once more,
The dropped term contracts the second derivative of against the residual . That residual is small near the MAP (and zero in the noise-free linear-Gaussian case), so Gauss–Newton discards it, leaving the positive-definite approximation (4).
The Gauss–Newton Inner Loop = Iterated BLUE¶
Each Gauss–Newton iteration solves the normal equations for an increment and updates the iterate:
where . The increment is exactly the minimiser of the linearised cost about — a linear-Gaussian problem with innovation . In other words, 3DVar is a sequence of BLUE analyses, each one re-linearising about the current iterate.
Each inner solve uses the same matrix-free machinery as OI — never forming or the Hessian, only their action on vectors:
import jax
from jax.scipy.sparse.linalg import cg
from jaxtyping import Array, Float
def gauss_newton_step(
xk: Float[Array, "N"],
xb: Float[Array, "N"],
y: Float[Array, "M"],
H, # nonlinear observation operator x -> H(x)
apply_B_inv, # v -> B⁻¹ v
apply_R_inv, # v -> R⁻¹ v
) -> Float[Array, "N"]:
Hk, Hk_lin = jax.linearize(H, xk) # Hk = H(xk); Hk_lin(v) = H'_k v
Hk_adj = jax.linear_transpose(Hk_lin, xk) # (H'_k)ᵀ
grad = apply_B_inv(xk - xb) - Hk_adj(apply_R_inv(y - Hk))[0] # ∇J(xk)
def hess(v: Float[Array, "N"]) -> Float[Array, "N"]: # B⁻¹ + H'_kᵀ R⁻¹ H'_k
return apply_B_inv(v) + Hk_adj(apply_R_inv(Hk_lin(v)))[0]
dx, _ = cg(hess, -grad) # solve normal equations
return xk + dxPreconditioning: the control-variable transform¶
The Hessian (4) contains , which is badly conditioned for smooth priors. Operational 3DVar minimises in a whitened control variable defined by Bannister, 2008, which turns the background term into an identity and preconditions the problem:
The background Hessian contribution becomes , so the conjugate- gradient inner loop converges in far fewer iterations.
Implementation¶
The Gauss–Newton form is a nonlinear least-squares problem: stack the whitened residuals so that , and hand it to a least-squares solver.
import optimistix as optx
import jax.numpy as jnp
from jaxtyping import Array, Float
def threedvar(
xb, y, H, whiten_B, whiten_R, # whiten_B: v -> B^{-1/2} v ; whiten_R: v -> R^{-1/2} v
*, solver=optx.GaussNewton(rtol=1e-8, atol=1e-8),
):
def residual(x: Float[Array, "N"], args) -> Float[Array, "N+M"]:
return jnp.concatenate([whiten_B(x - xb), whiten_R(y - H(x))])
sol = optx.least_squares(residual, solver, xb) # ½‖r‖² = J(x); warm-start at the background
return sol.value # MAP estimate x*Table 1:Choosing a minimiser by cost-function geometry.
| Minimiser | Use when |
|---|---|
GaussNewton | cost is approximately quadratic (least-squares) — the default |
LevenbergMarquardt | highly nonlinear / Gauss–Newton overshoots |
BFGS | general-purpose, no least-squares structure assumed |
NonlinearCG | memory-constrained problems |
Convergence is typically tens of iterations for well-conditioned problems. To
differentiate through the optimum (e.g. to learn or
), use an implicit adjoint (optx.ImplicitAdjoint): the implicit
function theorem gives the gradient at the solution in memory, without
unrolling the optimiser.
Posterior via the Laplace Approximation¶
For a Gaussian likelihood near the MAP, a second-order (Laplace) expansion of gives a Gaussian posterior whose covariance is the inverse Gauss–Newton Hessian:
This is exactly the OI posterior covariance with linearised at the MAP. Sampling and marginal variances are matvecs with — nothing is materialised. For strongly nonlinear operators the Laplace approximation under-reports uncertainty; reach for an ensemble covariance or an amortized posterior for better calibration.
When 3DVar Is Appropriate¶
is nonlinear (the reason to go beyond OI).
A single timestep — no dynamics.
The posterior is unimodal, well-approximated as Gaussian around the MAP.
and are Gaussian.
Typical cases: snapshot inversions (a single satellite overpass), static-field estimation, multi-instrument fusion at one time, single-overpass methane source attribution.
Linear → Optimal Interpolation / BLUE (closed form, one step).
Dynamics over a window → strong / incremental 4DVar.
Multimodal or strongly non-Gaussian posterior → ensemble methods (e.g. the Ensemble Kalman Filter) or an amortized posterior with a flow/score head.
- Lorenc, A. C. (1986). Analysis Methods for Numerical Weather Prediction. Quarterly Journal of the Royal Meteorological Society, 112(474), 1177–1194. 10.1002/qj.49711247414
- Talagrand, O. (1997). Assimilation of Observations, an Introduction. Journal of the Meteorological Society of Japan, 75(1B), 191–209. 10.2151/jmsj1965.75.1B_191
- Bannister, R. N. (2008). A Review of Forecast Error Covariance Statistics in Atmospheric Variational Data Assimilation. I: Characteristics and Measurements of Forecast Error Covariances. Quarterly Journal of the Royal Meteorological Society, 134(637), 1951–1970. 10.1002/qj.339