Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Optimal Interpolation / BLUE

CSIC
UCM
IGEO

For linear-Gaussian state estimation, the posterior is Gaussian in closed form and the analysis is a single matrix expression. This is the Best Linear Unbiased Estimator (BLUE), known in the geosciences as Optimal Interpolation (OI) Lorenc, 1981Daley, 1991. It is the canonical baseline every more sophisticated method must reduce to in the linear-Gaussian limit.

This note complements the existing Optimal Interpolation derivation; the code is library-agnostic — shapes via jaxtyping, array ops via einx, linear solves via jax.scipy.

Setup

A Gaussian prior on the state, a linear-Gaussian likelihood, and the resulting Gaussian posterior:

Prior:p(x)=N(x;xb,B),xRDxLikelihood:p(yx)=N(y;Hx,R),HRDy×DxPosterior:p(xy)=N(x;x,P).\begin{aligned} \text{Prior}: && p(\boldsymbol{x}) &= \mathcal{N}(\boldsymbol{x}; \boldsymbol{x}_b, \mathbf{B}), & \boldsymbol{x} &\in \mathbb{R}^{D_x} \\ \text{Likelihood}: && p(\boldsymbol{y} \mid \boldsymbol{x}) &= \mathcal{N}(\boldsymbol{y}; \mathbf{H}\boldsymbol{x}, \mathbf{R}), & \mathbf{H} &\in \mathbb{R}^{D_y \times D_x} \\ \text{Posterior}: && p(\boldsymbol{x} \mid \boldsymbol{y}) &= \mathcal{N}(\boldsymbol{x}; \boldsymbol{x}^\star, \mathbf{P}^\star). && \end{aligned}

The posterior mean is the background corrected by the innovation yHxb\boldsymbol{y} - \mathbf{H}\boldsymbol{x}_b, weighted by the Kalman gain K\mathbf{K}:

  x=xb+K(yHxb),K=BH(HBH+R)1  \boxed{\; \boldsymbol{x}^\star = \boldsymbol{x}_b + \mathbf{K}\,(\boldsymbol{y} - \mathbf{H}\boldsymbol{x}_b), \qquad \mathbf{K} = \mathbf{B}\mathbf{H}^\top \left( \mathbf{H}\mathbf{B}\mathbf{H}^\top + \mathbf{R} \right)^{-1} \;}

with posterior covariance expressible in two equivalent forms,

P=(IKH)B=(B1+HR1H)1.\mathbf{P}^\star = (\mathbf{I} - \mathbf{K}\mathbf{H})\,\mathbf{B} = \left( \mathbf{B}^{-1} + \mathbf{H}^\top \mathbf{R}^{-1} \mathbf{H} \right)^{-1}.

Two Expressions, Two Computations

The gain can be applied in observation space (Dy×DyD_y \times D_y inverse) or state space (Dx×DxD_x \times D_x inverse). Pick whichever inverse is smaller:

Table 1:Regime selection by problem dimensions.

RegimeForm to useInversion size
DyDxD_y \ll D_x (sparse observations)K=BH(HBH+R)1\mathbf{K} = \mathbf{B}\mathbf{H}^\top(\mathbf{H}\mathbf{B}\mathbf{H}^\top + \mathbf{R})^{-1}Dy×DyD_y \times D_y
DyDxD_y \gg D_x (dense observations)(P)1=B1+HR1H(\mathbf{P}^\star)^{-1} = \mathbf{B}^{-1} + \mathbf{H}^\top \mathbf{R}^{-1} \mathbf{H}Dx×DxD_x \times D_x

For SSH altimetry with sparse along-track samples on a 2D grid, DyDxD_y \ll D_x — use the observation-space form. For a satellite imager with one observation per state cell, DyDxD_y \approx D_x — either works. A good implementation picks automatically from the dimensions unless overridden.

The Matrices Are Never Materialised

At geophysical scale, B\mathbf{B} and HBH\mathbf{H}\mathbf{B}\mathbf{H}^\top are far too large to form. Everything runs through structured linear operators exposing only matrix–vector products:

The system (HBH+R)w=(yHxb)(\mathbf{H}\mathbf{B}\mathbf{H}^\top + \mathbf{R})\boldsymbol{w} = (\boldsymbol{y} - \mathbf{H}\boldsymbol{x}_b) is solved with conjugate gradients in O(kDy)O(k\,D_y) time, where kk is the CG iteration count (typically tens). A full OI on a 2D Matérn prior with 50,00050{,}000 grid cells and 5,0005{,}000 observations takes seconds on a single GPU — no materialised matrices, no O(Dx3)O(D_x^3) factorisations.

Implementation

A dense reference makes the gain explicit; the matrix-free form scales.

Dense (pedagogical)
Matrix-free (at scale)

Clear but O(Dx3)O(D_x^3) — fine for small problems and for testing the matrix-free version against ground truth.

import einx
import jax.numpy as jnp
from jaxtyping import Array, Float

def oi_dense(
    xb: Float[Array, "N"],     # background / prior mean
    y:  Float[Array, "M"],     # observations
    H:  Float[Array, "M N"],   # linear observation operator
    B:  Float[Array, "N N"],   # prior covariance
    R:  Float[Array, "M M"],   # observation-error covariance
) -> Float[Array, "N"]:
    BHt = einx.dot("N K, M K -> N M", B, H)            # B Hᵀ
    S   = einx.dot("M N, N L -> M L", H, BHt) + R       # H B Hᵀ + R
    K   = einx.dot("N M, M L -> N L", BHt, jnp.linalg.inv(S))
    return xb + einx.dot("N M, M -> N", K, y - einx.dot("M N, N -> M", H, xb))

The posterior covariance P\mathbf{P}^\star is itself a linear operator: drawing a sample is a matvec with (P)1/2(\mathbf{P}^\star)^{1/2}, and the marginal variances diag(P)\operatorname{diag}(\mathbf{P}^\star) are one more matvec family. Nothing is materialised.

OI as the Canonical Baseline

Connection to the Kalman Filter

The Kalman-filter analysis step is OI, with the forecast playing the role of the prior:

Forecast:xtf=Mtxt1a,Ptf=MtPt1aMt+QAnalysis:Kt=PtfH(HPtfH+R)1,xta=xtf+Kt(ytHxtf)Pta=(IKtH)Ptf.\begin{aligned} \text{Forecast}: && \boldsymbol{x}^f_t &= \mathbf{M}_t \boldsymbol{x}^a_{t-1}, & \mathbf{P}^f_t &= \mathbf{M}_t \mathbf{P}^a_{t-1} \mathbf{M}_t^\top + \mathbf{Q} \\ \text{Analysis}: && \mathbf{K}_t &= \mathbf{P}^f_t \mathbf{H}^\top (\mathbf{H}\mathbf{P}^f_t\mathbf{H}^\top + \mathbf{R})^{-1}, & \boldsymbol{x}^a_t &= \boldsymbol{x}^f_t + \mathbf{K}_t(\boldsymbol{y}_t - \mathbf{H}\boldsymbol{x}^f_t) \\ && \mathbf{P}^a_t &= (\mathbf{I} - \mathbf{K}_t\mathbf{H})\mathbf{P}^f_t. && \end{aligned}

So a Kalman filter composes cleanly: forecast the mean and covariance through the dynamical model, then run OI as the analysis with (xtf,Ptf)(\boldsymbol{x}^f_t, \mathbf{P}^f_t) as the prior. See the Kalman Filter and Sequential Inference notes for the full recursion.

When OI Is the Right Answer

Use OI when…
Reach for something else when…

All of the following hold:

  • H\mathbf{H} is linear — or the linearisation around xb\boldsymbol{x}_b is good enough that the residual nonlinearity is dominated by observation noise.

  • B\mathbf{B} and R\mathbf{R} are Gaussian.

  • The posterior is unimodal — no symmetry-breaking, no bistable regimes.

  • A single timestep or linear dynamics.

References
  1. Lorenc, A. C. (1981). A Global Three-Dimensional Multivariate Statistical Interpolation Scheme. Monthly Weather Review, 109(4), 701–721. https://doi.org/10.1175/1520-0493(1981)109<;0701:AGTDMS>2.0.CO;2
  2. Daley, R. (1991). Atmospheric Data Analysis. Cambridge University Press.
  3. Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press.