Optimal Interpolation / BLUE - Research Notebook

For linear-Gaussian state estimation, the posterior is Gaussian in closed form and the analysis is a single matrix expression. This is the Best Linear Unbiased Estimator (BLUE), known in the geosciences as Optimal Interpolation (OI) Lorenc, 1981Daley, 1991. It is the canonical baseline every more sophisticated method must reduce to in the linear-Gaussian limit.

This note complements the existing Optimal Interpolation derivation; the code is library-agnostic — shapes via jaxtyping, array ops via einx, linear solves via jax.scipy.

Setup¶

A Gaussian prior on the state, a linear-Gaussian likelihood, and the resulting Gaussian posterior:

\begin{aligned} \text{Prior}: && p(\boldsymbol{x}) &= \mathcal{N}(\boldsymbol{x}; \boldsymbol{x}_b, \mathbf{B}), & \boldsymbol{x} &\in \mathbb{R}^{D_x} \\ \text{Likelihood}: && p(\boldsymbol{y} \mid \boldsymbol{x}) &= \mathcal{N}(\boldsymbol{y}; \mathbf{H}\boldsymbol{x}, \mathbf{R}), & \mathbf{H} &\in \mathbb{R}^{D_y \times D_x} \\ \text{Posterior}: && p(\boldsymbol{x} \mid \boldsymbol{y}) &= \mathcal{N}(\boldsymbol{x}; \boldsymbol{x}^\star, \mathbf{P}^\star). && \end{aligned}

(1)

The posterior mean is the background corrected by the innovation $\boldsymbol{y} - \mathbf{H}\boldsymbol{x}_b$ , weighted by the Kalman gain $\mathbf{K}$ :

\boxed{\; \boldsymbol{x}^\star = \boldsymbol{x}_b + \mathbf{K}\,(\boldsymbol{y} - \mathbf{H}\boldsymbol{x}_b), \qquad \mathbf{K} = \mathbf{B}\mathbf{H}^\top \left( \mathbf{H}\mathbf{B}\mathbf{H}^\top + \mathbf{R} \right)^{-1} \;}

(2)

with posterior covariance expressible in two equivalent forms,

\mathbf{P}^\star = (\mathbf{I} - \mathbf{K}\mathbf{H})\,\mathbf{B} = \left( \mathbf{B}^{-1} + \mathbf{H}^\top \mathbf{R}^{-1} \mathbf{H} \right)^{-1}.

(3)

Two Expressions, Two Computations¶

The gain can be applied in observation space ( $D_y \times D_y$ inverse) or state space ( $D_x \times D_x$ inverse). Pick whichever inverse is smaller:

Table 1:Regime selection by problem dimensions.

Regime	Form to use	Inversion size
$D_y \ll D_x$ (sparse observations)	$\mathbf{K} = \mathbf{B}\mathbf{H}^\top(\mathbf{H}\mathbf{B}\mathbf{H}^\top + \mathbf{R})^{-1}$	$D_y \times D_y$
$D_y \gg D_x$ (dense observations)	$(\mathbf{P}^\star)^{-1} = \mathbf{B}^{-1} + \mathbf{H}^\top \mathbf{R}^{-1} \mathbf{H}$	$D_x \times D_x$

For SSH altimetry with sparse along-track samples on a 2D grid, $D_y \ll D_x$ — use the observation-space form. For a satellite imager with one observation per state cell, $D_y \approx D_x$ — either works. A good implementation picks automatically from the dimensions unless overridden.

The Matrices Are Never Materialised¶

At geophysical scale, $\mathbf{B}$ and $\mathbf{H}\mathbf{B}\mathbf{H}^\top$ are far too large to form. Everything runs through structured linear operators exposing only matrix–vector products:

$\mathbf{B}$ — a structured covariance (e.g. a Matérn kernel on a regular grid) with $O(D_x \log D_x)$ matvecs via the FFT. See Gaussian Processes for Machine Learning Rasmussen & Williams, 2006, Ch. 2.7.
$\mathbf{R}$ — typically diagonal (per-pixel variances), so $\mathbf{R}^{-1}$ is trivial.
$\mathbf{H}\mathbf{B}\mathbf{H}^\top$ — never materialised; only its action on a vector is computed.

The system $(\mathbf{H}\mathbf{B}\mathbf{H}^\top + \mathbf{R})\boldsymbol{w} = (\boldsymbol{y} - \mathbf{H}\boldsymbol{x}_b)$ is solved with conjugate gradients in $O(k\,D_y)$ time, where $k$ is the CG iteration count (typically tens). A full OI on a 2D Matérn prior with $50{,}000$ grid cells and $5{,}000$ observations takes seconds on a single GPU — no materialised matrices, no $O(D_x^3)$ factorisations.

Implementation¶

A dense reference makes the gain explicit; the matrix-free form scales.

Dense (pedagogical)

Matrix-free (at scale)

Clear but $O(D_x^3)$ — fine for small problems and for testing the matrix-free version against ground truth.

import einx
import jax.numpy as jnp
from jaxtyping import Array, Float

def oi_dense(
    xb: Float[Array, "N"],     # background / prior mean
    y:  Float[Array, "M"],     # observations
    H:  Float[Array, "M N"],   # linear observation operator
    B:  Float[Array, "N N"],   # prior covariance
    R:  Float[Array, "M M"],   # observation-error covariance
) -> Float[Array, "N"]:
    BHt = einx.dot("N K, M K -> N M", B, H)            # B Hᵀ
    S   = einx.dot("M N, N L -> M L", H, BHt) + R       # H B Hᵀ + R
    K   = einx.dot("N M, M L -> N L", BHt, jnp.linalg.inv(S))
    return xb + einx.dot("N M, M -> N", K, y - einx.dot("M N, N -> M", H, xb))

Pass $\mathbf{B}, \mathbf{R}, \mathbf{H}, \mathbf{H}^\top$ as operators (callables) and solve with CG — nothing dense is ever formed.

from jax.scipy.sparse.linalg import cg
from jaxtyping import Array, Float

def oi_matrix_free(
    xb: Float[Array, "N"],
    y:  Float[Array, "M"],
    H,        # x -> H x        (linear)
    Ht,       # y -> Hᵀ y       (adjoint, e.g. jax.linear_transpose)
    apply_B,  # x -> B x        (structured, FFT-based matvec)
    apply_R,  # y -> R y        (diagonal)
) -> Float[Array, "N"]:
    """Observation-space BLUE (D_y ≪ D_x form)."""
    innovation = y - H(xb)                              # d = y - H x_b
    def HBHt_plus_R(v: Float[Array, "M"]) -> Float[Array, "M"]:
        return H(apply_B(Ht(v))) + apply_R(v)
    w, _ = cg(HBHt_plus_R, innovation)                 # (H B Hᵀ + R) w = d
    return xb + apply_B(Ht(w))                          # x* = x_b + B Hᵀ w

The posterior covariance $\mathbf{P}^\star$ is itself a linear operator: drawing a sample is a matvec with $(\mathbf{P}^\star)^{1/2}$ , and the marginal variances $\operatorname{diag}(\mathbf{P}^\star)$ are one more matvec family. Nothing is materialised.

OI as the Canonical Baseline¶

Connection to the Kalman Filter¶

The Kalman-filter analysis step is OI, with the forecast playing the role of the prior:

\begin{aligned} \text{Forecast}: && \boldsymbol{x}^f_t &= \mathbf{M}_t \boldsymbol{x}^a_{t-1}, & \mathbf{P}^f_t &= \mathbf{M}_t \mathbf{P}^a_{t-1} \mathbf{M}_t^\top + \mathbf{Q} \\ \text{Analysis}: && \mathbf{K}_t &= \mathbf{P}^f_t \mathbf{H}^\top (\mathbf{H}\mathbf{P}^f_t\mathbf{H}^\top + \mathbf{R})^{-1}, & \boldsymbol{x}^a_t &= \boldsymbol{x}^f_t + \mathbf{K}_t(\boldsymbol{y}_t - \mathbf{H}\boldsymbol{x}^f_t) \\ && \mathbf{P}^a_t &= (\mathbf{I} - \mathbf{K}_t\mathbf{H})\mathbf{P}^f_t. && \end{aligned}

(4)

So a Kalman filter composes cleanly: forecast the mean and covariance through the dynamical model, then run OI as the analysis with $(\boldsymbol{x}^f_t, \mathbf{P}^f_t)$ as the prior. See the Kalman Filter and Sequential Inference notes for the full recursion.

When OI Is the Right Answer¶

Use OI when…

Reach for something else when…

All of the following hold:

$\mathbf{H}$ is linear — or the linearisation around $\boldsymbol{x}_b$ is good enough that the residual nonlinearity is dominated by observation noise.
$\mathbf{B}$ and $\mathbf{R}$ are Gaussian.
The posterior is unimodal — no symmetry-breaking, no bistable regimes.
A single timestep or linear dynamics.

References¶

Lorenc, A. C. (1981). A Global Three-Dimensional Multivariate Statistical Interpolation Scheme. Monthly Weather Review, 109(4), 701–721. https://doi.org/10.1175/1520-0493(1981)109<;0701:AGTDMS>2.0.CO;2
Daley, R. (1991). Atmospheric Data Analysis. Cambridge University Press.
Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press.