The observation model is the bridge between the state we want to estimate and the data we actually measure. It answers a single question: given a state , what observations should we expect to see? Everything downstream — the likelihood, the variational cost, the tangent-linear and adjoint operators in 4DVar — is built on top of the observation operator .
The code below is deliberately library-agnostic: shapes are pinned with jaxtyping, array operations are written with einx, and the tangent-linear / adjoint operators come for free from JAX autodiff.
The Likelihood Model¶
We assume Gaussian observation errors,
which yields the likelihood . The negative log-likelihood (up to a constant) is the observation cost that appears in every variational problem,
import einx
from jaxtyping import Array, Float
def obs_cost(
y: Float[Array, "M"],
Hx: Float[Array, "M"],
R_inv: Float[Array, "M M"], # applied matrix-free in practice (see below)
) -> Float[Array, ""]:
"""Jₒᵦₛ = ½ (y - H(x))ᵀ R⁻¹ (y - H(x))."""
r = y - Hx
return 0.5 * einx.dot("M, M ->", r, einx.dot("M K, K -> M", R_inv, r))The Observation Operator¶
An observation operator is any differentiable map from state space to observation space. That is the entire contract:
from typing import Protocol
from jaxtyping import Array, Float
class ObservationOperator(Protocol):
def __call__(self, x: Float[Array, "N"]) -> Float[Array, "M"]:
"""Forward map H : state space (N) -> observation space (M)."""
...Four Operator Families¶
The simplest case observes the state wherever a mask is non-zero:
Tangent-linear: . Use cases include SSH altimetry along-track gaps, SST under cloud masks, and in-situ samples on a grid.
def masked_identity(
x: Float[Array, "N"],
mask: Float[Array, "N"], # 0/1 weights
) -> Float[Array, "N"]:
return einx.multiply("N, N -> N", mask, x)A pre-computed projection from state space to observation space:
Use cases include Lagrangian footprint matrices from particle simulators, spectral filtering, and spatial interpolation. When has exploitable structure (sparse, low-rank, Kronecker), keep it as a linear operator rather than materializing the dense matrix.
def linear_obs(
H: Float[Array, "M N"],
x: Float[Array, "N"],
) -> Float[Array, "M"]:
return einx.dot("M N, N -> M", H, x)For satellite L2 products from optimal-estimation retrievals (TROPOMI CH₄, EMIT CH₄, OCO CO₂, MOPITT CO), the retrieval is not a direct sample of the mixing-ratio profile . It applies a smoothing matrix (the averaging kernel) and falls back to a retrieval prior where the signal is weak:
| Symbol | Description |
|---|---|
| model state (profile, surface field) | |
| retrieval prior from L2 metadata | |
| weighting vector (often pressure-weighted) | |
| averaging-kernel matrix |
Tangent-linear: .
def averaging_kernel(
x: Float[Array, "N"],
A: Float[Array, "M N"],
x_a: Float[Array, "N"],
h: Float[Array, "N"],
) -> Float[Array, "M"]:
blended = einx.multiply("N, N -> N", h, x) + einx.multiply("N, N -> N", 1.0 - h, x_a)
return einx.dot("M N, N -> M", A, blended)When is structured (Kronecker for separable kernels, low-rank for under-determined inversions, banded for compactly-supported kernels), apply it matrix-free instead of forming the dense matrix.
Operational work combines multiple instruments. Each has its own forward map , mask , error covariance , and optionally an averaging kernel . The costs compose additively at the likelihood level:
The per-instrument weight defaults to uniform and supports hierarchical bias-correction. Quality masks zero-weight unreliable pixels (they contribute zero log-likelihood rather than being dropped), which keeps the effective per-instrument observation count auditable.
from typing import Callable, NamedTuple
class Instrument(NamedTuple):
H: Callable[[Float[Array, "N"]], Float[Array, "M"]]
y: Float[Array, "M"]
mask: Float[Array, "M"] # 0/1 quality weights
R_inv: Float[Array, "M M"]
weight: float = 1.0 # αᵢ
def fusion_cost(
x: Float[Array, "N"],
instruments: list[Instrument],
) -> Float[Array, ""]:
total = 0.0
for ins in instruments:
r = einx.multiply("M, M -> M", ins.mask, ins.y - ins.H(x))
Rinv_r = einx.dot("M K, K -> M", ins.R_inv, r)
total += ins.weight * 0.5 * einx.dot("M, M ->", r, Rinv_r)
return totalTangent-Linear and Adjoint¶
Incremental 4DVar and the Gauss–Newton inner loop need the tangent-linear operator (forward) and its adjoint (backward). Both are autodiff primitives — no hand-coding:
import jax
def tlm(H, x: Float[Array, "N"], dx: Float[Array, "N"]) -> Float[Array, "M"]:
"""Forward tangent-linear: H'(x) · dx (a JVP)."""
_, dy = jax.jvp(H, (x,), (dx,))
return dy
def adjoint(H, x: Float[Array, "N"], r: Float[Array, "M"]) -> Float[Array, "N"]:
"""Adjoint: H'(x)ᵀ · r (a VJP)."""
_, vjp_fn = jax.vjp(H, x)
(out,) = vjp_fn(r)
return outThe adjoint test verifies the inner-product identity
up to floating-point tolerance. A failure means the gradient flowing back through is wrong, so downstream optimizers take incorrect steps.
import jax.numpy as jnp
def adjoint_test(H, x, u: Float[Array, "N"], v: Float[Array, "M"]) -> bool:
lhs = jnp.vdot(tlm(H, x, u), v) # ⟨H'u, v⟩
rhs = jnp.vdot(u, adjoint(H, x, v)) # ⟨u, (H')ᵀv⟩
return bool(jnp.allclose(lhs, rhs))Observation-Error Covariance ¶
enters only through , which is applied lazily (e.g. via conjugate gradients) and never materialized. Common parameterizations:
Diagonal — . The default for heteroscedastic retrievals (per-pixel from L2 metadata); is an elementwise divide.
Block-diagonal — one block per instrument when fusing.
Structured — a spatial-correlation kernel (e.g. Matérn) when observation errors are correlated, as for high-resolution imaging spectrometers. Applied matrix-free.
def apply_R_inv_diag(
variances: Float[Array, "M"], # σ²
r: Float[Array, "M"],
) -> Float[Array, "M"]:
"""R⁻¹ r for diagonal R — no inverse materialized."""
return einx.divide("M, M -> M", r, variances)When the Observation Model Is Wrong¶
If is misspecified — wrong averaging kernel, missing instrument bias, ignored representation error — the MAP estimate converges to the wrong answer with a confident-looking posterior. Three diagnostics catch this:
Innovation statistics — should be roughly mean-zero and consistent with . Large systematic offsets indicate bias.
Posterior-vs-prior shift — if the data move the posterior much further than from the prior, the or assumptions are violated.
Cross-instrument disagreement — per-instrument residual statistics should be consistent; large disagreements flag uncorrected bias and motivate joint bias estimation.