Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Data assimilation benchmarks

A side-by-side comparison of the seven pipekit_cycle.AnalysisStep methods shipped in vardax on three shared chaotic test problems — Lorenz-63 (3-D), Lorenz-96 single-level (40-D), and Lorenz-96 two-level (Wilks 2005 multi- scale, 72-D). Following Ahmed et al. 2020 (PyDA), every notebook runs the analysis on a short assim window and then free-forecasts for many Lyapunov times so the chaotic divergence (and how well each method’s x0x_0 suppresses it) is the headline view. RMSE(t) traces overlay all methods on a single plot.

Headline — Lorenz-63 (PRNGKey(42))

0.5-time-unit assim + 9.5-time-unit free-forecast (~9 Lyapunov times). Full state observed every 0.05 time units, Gaussian noise std 0.5.

Methodrmse_assimrmse_forecastrmse_totalInferenceTraining
Strong-4DVar0.050.080.081.7 s
AmortizedPosterior0.320.430.428 ms~2 s
Weak-4DVar (T_assim=10)0.420.760.752.1 s
FourDVarNet6.450.911.710.1 s~3 s
OI = 3DVar15.111.093.57(D14)
Incremental-4DVar1.202.172.138 s

rmse_assim is dominated by unobserved time slices for OI / 3DVar (no temporal coupling in BB), so the forecast column is the real discriminator. Strong-4DVar’s analysis recovers x0x_0 to ~0.05 RMSE and the perfect-model integration stays on the truth’s attractor branch for the full 10-time-unit window — a 14× forecast-skill win over OI / 3DVar.

Headline — Lorenz-96 single-level (PRNGKey(0))

0.5-time-unit assim + 2-time-unit free-forecast (~5 L96 Lyapunov times). Spatial + temporal sparse obs on every 4th cell every 0.05 time units (110 of 2010 entries observed).

Methodrmse_assimrmse_forecastrmse_totalInference
Strong-4DVar3.744.144.062.6 s
OI / 3DVar3.934.214.16(D14)
AmortizedPosterior3.074.484.230.2 s
FourDVarNet3.184.734.460.4 s
Incremental-4DVar4.635.125.0213 s

L96’s faster Lyapunov clock (~0.5 time units) means even Strong-4DVar’s forecast saturates near the attractor RMS within 2 time units. The separation between methods is most visible in the RMSE(t) trace in notebook 10 — methods with good x0x_0 recovery stay below the saturation level for ~1 extra Lyapunov time.

Headline — Lorenz-96 two-level (PRNGKey(0))

0.4-time-unit assim + 0.6-time-unit free-forecast (~2 slow Lyapunov times). Slow obs every 2nd cell every 0.05 time units; fast unobserved.

Per-block RMSE (prior floors: slow 7.15, fast 0.35):

Methodslowfasttotalrmse_forecastInference
Strong-4DVar1.660.490.720.524.6 s
OI / 3DVar6.070.382.051.97(D14)
AmortizedPosterior6.820.392.302.570.2 s
Weak-4DVar8.900.473.003.49~7 s
FourDVarNetNaNNaNNaNNaN(under-trained — see notebook)
Incremental-4DVar(diverges)

Strong-4DVar recovers slow at 4× the prior accuracy; OI / 3DVar and Amortized stay near the prior on the slow block. The fast block sits near the prior floor for everyone except weak (which diverges on this stiff coupling). Incremental-4DVar’s GN linearisation can’t handle the slow-fast stiffness; FourDVarNet’s 72-D head training is under-tuned at the budget used here.

Notebooks

Lorenz-63

Lorenz-96 single-level

Lorenz-96 two-level (Wilks 2005)

Running

# From repo root, set up the env (uv)
uv pip install -e projects/assimilation

# Execute every notebook end-to-end (skips the 00 setup .md)
for f in projects/assimilation/notebooks/{0[1-8],09,10,11,12}_*.py; do
  uv run jupytext --to ipynb --execute "$f"
done

# Or open one in JupyterLab
uv run jupyter lab projects/assimilation/notebooks/03_strong_4dvar.ipynb

The whole 12-notebook run takes ~10-15 minutes on a laptop CPU; the learned-method training dominates the runtime.

Code layout

projects/assimilation/
├── pyproject.toml                 # vardax + scientific stack
├── README.md
├── src/assimilation/
│   ├── lorenz63.py                # Lorenz63Forward + generate_problem
│   ├── lorenz96.py                # L96 single-level
│   ├── lorenz96_2l.py             # L96 two-level (Wilks 2005)
│   ├── metrics.py                 # rmse, sigma_coverage, nll_gaussian
│   └── benchmark.py               # assim_batch, free_forecast,
│                                  # assemble_full_trajectory, run_method, compare
├── tests/
│   └── test_lorenz96_2l.py        # vector-field regression tests
└── notebooks/
    ├── 00_lorenz63_setup.md       # prose intro for L63
    └── 0[1-8],09,10,11,12_*.ipynb + .py

The assim+forecast harness is in benchmark.py: methods run on assim_batch(problem), return either an (T_assim+1, D) trajectory or just x_0, then assemble_full_trajectory extends to (T_total+1, D) by free-forecasting the rest. MethodResult carries rmse_assim, rmse_forecast, rmse_total, and a rmse_trace(t) for plotting.

Known limitations

Open follow-ups