For linear-Gaussian state estimation, the posterior is Gaussian in closed form and the analysis is a single matrix expression. This is the Best Linear Unbiased Estimator (BLUE), known in the geosciences as Optimal Interpolation (OI) Lorenc, 1981Daley, 1991. It is the canonical baseline every more sophisticated method must reduce to in the linear-Gaussian limit.
This note complements the existing Optimal Interpolation derivation;
the code is library-agnostic — shapes via
jaxtyping, array ops via
einx, linear solves via jax.scipy.
Setup¶
A Gaussian prior on the state, a linear-Gaussian likelihood, and the resulting Gaussian posterior:
The posterior mean is the background corrected by the innovation , weighted by the Kalman gain :
with posterior covariance expressible in two equivalent forms,
Two Expressions, Two Computations¶
The gain can be applied in observation space ( inverse) or state space ( inverse). Pick whichever inverse is smaller:
Table 1:Regime selection by problem dimensions.
| Regime | Form to use | Inversion size |
|---|---|---|
| (sparse observations) | ||
| (dense observations) |
For SSH altimetry with sparse along-track samples on a 2D grid, — use the observation-space form. For a satellite imager with one observation per state cell, — either works. A good implementation picks automatically from the dimensions unless overridden.
The Matrices Are Never Materialised¶
At geophysical scale, and are far too large to form. Everything runs through structured linear operators exposing only matrix–vector products:
— a structured covariance (e.g. a Matérn kernel on a regular grid) with matvecs via the FFT. See Gaussian Processes for Machine Learning Rasmussen & Williams, 2006, Ch. 2.7.
— typically diagonal (per-pixel variances), so is trivial.
— never materialised; only its action on a vector is computed.
The system is solved with conjugate gradients in time, where is the CG iteration count (typically tens). A full OI on a 2D Matérn prior with grid cells and observations takes seconds on a single GPU — no materialised matrices, no factorisations.
Implementation¶
A dense reference makes the gain explicit; the matrix-free form scales.
Clear but — fine for small problems and for testing the matrix-free version against ground truth.
import einx
import jax.numpy as jnp
from jaxtyping import Array, Float
def oi_dense(
xb: Float[Array, "N"], # background / prior mean
y: Float[Array, "M"], # observations
H: Float[Array, "M N"], # linear observation operator
B: Float[Array, "N N"], # prior covariance
R: Float[Array, "M M"], # observation-error covariance
) -> Float[Array, "N"]:
BHt = einx.dot("N K, M K -> N M", B, H) # B Hᵀ
S = einx.dot("M N, N L -> M L", H, BHt) + R # H B Hᵀ + R
K = einx.dot("N M, M L -> N L", BHt, jnp.linalg.inv(S))
return xb + einx.dot("N M, M -> N", K, y - einx.dot("M N, N -> M", H, xb))Pass as operators (callables) and solve with CG — nothing dense is ever formed.
from jax.scipy.sparse.linalg import cg
from jaxtyping import Array, Float
def oi_matrix_free(
xb: Float[Array, "N"],
y: Float[Array, "M"],
H, # x -> H x (linear)
Ht, # y -> Hᵀ y (adjoint, e.g. jax.linear_transpose)
apply_B, # x -> B x (structured, FFT-based matvec)
apply_R, # y -> R y (diagonal)
) -> Float[Array, "N"]:
"""Observation-space BLUE (D_y ≪ D_x form)."""
innovation = y - H(xb) # d = y - H x_b
def HBHt_plus_R(v: Float[Array, "M"]) -> Float[Array, "M"]:
return H(apply_B(Ht(v))) + apply_R(v)
w, _ = cg(HBHt_plus_R, innovation) # (H B Hᵀ + R) w = d
return xb + apply_B(Ht(w)) # x* = x_b + B Hᵀ wThe posterior covariance is itself a linear operator: drawing a sample is a matvec with , and the marginal variances are one more matvec family. Nothing is materialised.
OI as the Canonical Baseline¶
Connection to the Kalman Filter¶
The Kalman-filter analysis step is OI, with the forecast playing the role of the prior:
So a Kalman filter composes cleanly: forecast the mean and covariance through the dynamical model, then run OI as the analysis with as the prior. See the Kalman Filter and Sequential Inference notes for the full recursion.
When OI Is the Right Answer¶
All of the following hold:
is linear — or the linearisation around is good enough that the residual nonlinearity is dominated by observation noise.
and are Gaussian.
The posterior is unimodal — no symmetry-breaking, no bistable regimes.
A single timestep or linear dynamics.
Nonlinear → 3DVar (Gauss–Newton on the nonlinear cost).
Dynamics over a window → strong / incremental 4DVar.
Non-Gaussian likelihood → amortized posterior with a flow/score head.
Multimodal posterior → ensemble methods (e.g. the Ensemble Kalman Filter).
- Lorenc, A. C. (1981). A Global Three-Dimensional Multivariate Statistical Interpolation Scheme. Monthly Weather Review, 109(4), 701–721. https://doi.org/10.1175/1520-0493(1981)109<0701:AGTDMS>2.0.CO;2
- Daley, R. (1991). Atmospheric Data Analysis. Cambridge University Press.
- Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press.