The Observation Model

The observation model is the bridge between the state we want to estimate and the data we actually measure. It answers a single question: given a state $\boldsymbol{x}$ , what observations $\boldsymbol{y}$ should we expect to see? Everything downstream — the likelihood, the variational cost, the tangent-linear and adjoint operators in 4DVar — is built on top of the observation operator $H$ .

The code below is deliberately library-agnostic: shapes are pinned with jaxtyping, array operations are written with einx, and the tangent-linear / adjoint operators come for free from JAX autodiff.

The Likelihood Model¶

We assume Gaussian observation errors,

\boldsymbol{y} = H(\boldsymbol{x}) + \boldsymbol{\varepsilon}, \qquad \boldsymbol{\varepsilon} \sim \mathcal{N}(\boldsymbol{0}, \mathbf{R}),

(1)

which yields the likelihood $p(\boldsymbol{y} \mid \boldsymbol{x}) = \mathcal{N}(\boldsymbol{y}; H(\boldsymbol{x}), \mathbf{R})$ . The negative log-likelihood (up to a constant) is the observation cost that appears in every variational problem,

J_{\text{obs}}(\boldsymbol{x}) = \tfrac{1}{2} \, \| \boldsymbol{y} - H(\boldsymbol{x}) \|^2_{\mathbf{R}^{-1}} = \tfrac{1}{2} \, (\boldsymbol{y} - H(\boldsymbol{x}))^\top \mathbf{R}^{-1} (\boldsymbol{y} - H(\boldsymbol{x})).

(2)

import einx
from jaxtyping import Array, Float

def obs_cost(
    y:     Float[Array, "M"],
    Hx:    Float[Array, "M"],
    R_inv: Float[Array, "M M"],   # applied matrix-free in practice (see below)
) -> Float[Array, ""]:
    """Jₒᵦₛ = ½ (y - H(x))ᵀ R⁻¹ (y - H(x))."""
    r = y - Hx
    return 0.5 * einx.dot("M, M ->", r, einx.dot("M K, K -> M", R_inv, r))

The Observation Operator¶

An observation operator is any differentiable map from state space to observation space. That is the entire contract:

from typing import Protocol
from jaxtyping import Array, Float

class ObservationOperator(Protocol):
    def __call__(self, x: Float[Array, "N"]) -> Float[Array, "M"]:
        """Forward map H : state space (N) -> observation space (M)."""
        ...

Four Operator Families¶

Masked Identity

Linear Projection

Averaging Kernel

Multi-Instrument Fusion

The simplest case observes the state wherever a mask is non-zero:

H(\boldsymbol{x}) = \boldsymbol{m} \odot \boldsymbol{x}, \qquad \boldsymbol{m} \in \{0, 1\}^{D_x}.

(3)

Tangent-linear: $H'(\boldsymbol{x}) = \operatorname{diag}(\boldsymbol{m})$ . Use cases include SSH altimetry along-track gaps, SST under cloud masks, and in-situ samples on a grid.

def masked_identity(
    x:    Float[Array, "N"],
    mask: Float[Array, "N"],   # 0/1 weights
) -> Float[Array, "N"]:
    return einx.multiply("N, N -> N", mask, x)

A pre-computed projection from state space to observation space:

H(\boldsymbol{x}) = \mathbf{H} \, \boldsymbol{x}.

(4)

Use cases include Lagrangian footprint matrices from particle simulators, spectral filtering, and spatial interpolation. When $\mathbf{H}$ has exploitable structure (sparse, low-rank, Kronecker), keep it as a linear operator rather than materializing the dense matrix.

def linear_obs(
    H: Float[Array, "M N"],
    x: Float[Array, "N"],
) -> Float[Array, "M"]:
    return einx.dot("M N, N -> M", H, x)

For satellite L2 products from optimal-estimation retrievals (TROPOMI CH₄, EMIT CH₄, OCO CO₂, MOPITT CO), the retrieval is not a direct sample of the mixing-ratio profile $\boldsymbol{x}$ . It applies a smoothing matrix $\mathbf{A}$ (the averaging kernel) and falls back to a retrieval prior $\boldsymbol{x}_a$ where the signal is weak:

\hat{\boldsymbol{y}} = \mathbf{A} \left( \boldsymbol{h} \odot \boldsymbol{x} + (\boldsymbol{1} - \boldsymbol{h}) \odot \boldsymbol{x}_a \right).

(5)

Symbol	Description
$\boldsymbol{x} \in \mathbb{R}^{D_x}$	model state (profile, surface field)
$\boldsymbol{x}_a \in \mathbb{R}^{D_x}$	retrieval prior from L2 metadata
$\boldsymbol{h} \in \mathbb{R}^{D_x}$	weighting vector (often pressure-weighted)
$\mathbf{A} \in \mathbb{R}^{D_x \times D_x}$	averaging-kernel matrix

Tangent-linear: $H'(\boldsymbol{x}) = \mathbf{A} \operatorname{diag}(\boldsymbol{h})$ .

def averaging_kernel(
    x:   Float[Array, "N"],
    A:   Float[Array, "M N"],
    x_a: Float[Array, "N"],
    h:   Float[Array, "N"],
) -> Float[Array, "M"]:
    blended = einx.multiply("N, N -> N", h, x) + einx.multiply("N, N -> N", 1.0 - h, x_a)
    return einx.dot("M N, N -> M", A, blended)

When $\mathbf{A}$ is structured (Kronecker for separable kernels, low-rank for under-determined inversions, banded for compactly-supported kernels), apply it matrix-free instead of forming the dense matrix.

Operational work combines multiple instruments. Each has its own forward map $H_i$ , mask $\boldsymbol{m}_i$ , error covariance $\mathbf{R}_i$ , and optionally an averaging kernel $(\mathbf{A}_i, \boldsymbol{x}_{a,i}, \boldsymbol{h}_i)$ . The costs compose additively at the likelihood level:

J_{\text{obs}}(\boldsymbol{x}) = \sum_{i \in \mathcal{I}} \alpha_i \cdot \tfrac{1}{2} \, \| \boldsymbol{m}_i \odot (\boldsymbol{y}_i - H_i(\boldsymbol{x})) \|^2_{\mathbf{R}_i^{-1}}.

(6)

The per-instrument weight $\alpha_i$ defaults to uniform and supports hierarchical bias-correction. Quality masks zero-weight unreliable pixels (they contribute zero log-likelihood rather than being dropped), which keeps the effective per-instrument observation count auditable.

from typing import Callable, NamedTuple

class Instrument(NamedTuple):
    H:      Callable[[Float[Array, "N"]], Float[Array, "M"]]
    y:      Float[Array, "M"]
    mask:   Float[Array, "M"]    # 0/1 quality weights
    R_inv:  Float[Array, "M M"]
    weight: float = 1.0          # αᵢ

def fusion_cost(
    x: Float[Array, "N"],
    instruments: list[Instrument],
) -> Float[Array, ""]:
    total = 0.0
    for ins in instruments:
        r      = einx.multiply("M, M -> M", ins.mask, ins.y - ins.H(x))
        Rinv_r = einx.dot("M K, K -> M", ins.R_inv, r)
        total += ins.weight * 0.5 * einx.dot("M, M ->", r, Rinv_r)
    return total

Tangent-Linear and Adjoint¶

Incremental 4DVar and the Gauss–Newton inner loop need the tangent-linear operator $H'(\boldsymbol{x})$ (forward) and its adjoint $H'(\boldsymbol{x})^\top$ (backward). Both are autodiff primitives — no hand-coding:

import jax

def tlm(H, x: Float[Array, "N"], dx: Float[Array, "N"]) -> Float[Array, "M"]:
    """Forward tangent-linear:  H'(x) · dx   (a JVP)."""
    _, dy = jax.jvp(H, (x,), (dx,))
    return dy

def adjoint(H, x: Float[Array, "N"], r: Float[Array, "M"]) -> Float[Array, "N"]:
    """Adjoint:  H'(x)ᵀ · r   (a VJP)."""
    _, vjp_fn = jax.vjp(H, x)
    (out,) = vjp_fn(r)
    return out

The adjoint test verifies the inner-product identity

\langle H' \boldsymbol{u}, \boldsymbol{v} \rangle = \langle \boldsymbol{u}, (H')^\top \boldsymbol{v} \rangle \qquad \forall \, \boldsymbol{u}, \boldsymbol{v}

(7)

up to floating-point tolerance. A failure means the gradient flowing back through $H$ is wrong, so downstream optimizers take incorrect steps.

import jax.numpy as jnp

def adjoint_test(H, x, u: Float[Array, "N"], v: Float[Array, "M"]) -> bool:
    lhs = jnp.vdot(tlm(H, x, u), v)        # ⟨H'u, v⟩
    rhs = jnp.vdot(u, adjoint(H, x, v))    # ⟨u, (H')ᵀv⟩
    return bool(jnp.allclose(lhs, rhs))

Observation-Error Covariance $\mathbf{R}$ ¶

$\mathbf{R}$ enters $J_{\text{obs}}$ only through $\mathbf{R}^{-1}$ , which is applied lazily (e.g. via conjugate gradients) and never materialized. Common parameterizations:

Diagonal — $\mathbf{R} = \operatorname{diag}(\boldsymbol{\sigma}^2)$ . The default for heteroscedastic retrievals (per-pixel $\sigma$ from L2 metadata); $\mathbf{R}^{-1}\boldsymbol{r}$ is an elementwise divide.
Block-diagonal — one block $\mathbf{R}_i$ per instrument when fusing.
Structured — a spatial-correlation kernel (e.g. Matérn) when observation errors are correlated, as for high-resolution imaging spectrometers. Applied matrix-free.

def apply_R_inv_diag(
    variances: Float[Array, "M"],   # σ²
    r:         Float[Array, "M"],
) -> Float[Array, "M"]:
    """R⁻¹ r for diagonal R — no inverse materialized."""
    return einx.divide("M, M -> M", r, variances)

When the Observation Model Is Wrong¶

If $H$ is misspecified — wrong averaging kernel, missing instrument bias, ignored representation error — the MAP estimate $\boldsymbol{x}^\star$ converges to the wrong answer with a confident-looking posterior. Three diagnostics catch this:

Innovation statistics — $\boldsymbol{d} = \boldsymbol{y} - H(\boldsymbol{x}_b)$ should be roughly mean-zero and consistent with $\sqrt{\operatorname{diag}(\mathbf{H}\mathbf{B}\mathbf{H}^\top + \mathbf{R})}$ . Large systematic offsets indicate bias.
Posterior-vs-prior shift — if the data move the posterior much further than $\sigma_{\text{post}}$ from the prior, the $\mathbf{B}$ or $\mathbf{R}$ assumptions are violated.
Cross-instrument disagreement — per-instrument residual statistics should be consistent; large disagreements flag uncorrected bias and motivate joint bias estimation.

The Likelihood Model¶

The Observation Operator¶

Four Operator Families¶

Tangent-Linear and Adjoint¶

Observation-Error Covariance R\mathbf{R}R¶

When the Observation Model Is Wrong¶

Observation-Error Covariance $\mathbf{R}$ ¶