Tier II — Lagrangian particle transport

Forward model: stochastic particle trajectories driven by wind + turbulence. The bridge between the analytical Tier I (no real wind variability) and the full PDE Tier III (no statistical efficiency). This is what FLEXPART (Stohl et al., 2005) and STILT do operationally — see also HYSPLIT (Stein et al., 2015).

This tier is not yet started in plume_simulation — module layout proposed below.

(1) Simple model¶

Stochastic dynamics — Markov-1 Langevin¶

Operational LPDMs (FLEXPART, STILT, HYSPLIT) use Markov-1: a Langevin equation on particle velocity, then position from velocity. We commit to Markov-1 because it’s required for near-source super-emitter work and for non-stationary turbulence:

\begin{aligned} \mathrm{d}\mathbf{v} &= \mathbf{a}(\mathbf{x}, \mathbf{v}, t)\, \mathrm{d}t \;+\; \mathbf{b}(\mathbf{x}, t)\, \mathrm{d}\mathbf{W}, \qquad \mathrm{d}\mathbf{W} \sim \mathcal{N}(\mathbf{0}, \mathrm{d}t \cdot \mathbf{I}_3) \\ \mathrm{d}\mathbf{x} &= \bigl(\mathbf{u}(\mathbf{x},t) + \mathbf{v}\bigr)\, \mathrm{d}t \end{aligned}

((1))

$\mathbf{u}(\mathbf{x},t)$ — mean wind from the met field (WRF / ERA5 / HRRR).
$\mathbf{v}$ — turbulent velocity perturbation, the actual stochastic state.
$\mathbf{a}, \mathbf{b}$ — drift and diffusion coefficients constructed from the Reynolds-stress tensor $\sigma_{ij}(\mathbf{x},t)$ and Lagrangian timescale $\tau_L(\mathbf{x},t)$ . Reference: Thomson, 1987, Wilson & Sawford, 1996 (well-mixed condition).
$\sigma_{ij}, \tau_L$ come from MO similarity (Monin & Obukhov, 1954) parameterised by $(u_*, L_\text{Obukhov}, z_0)$ plus WRF TKE — see MO prereqs.

Sub-grid wind interpolation¶

WRF velocities are gridded at hourly snapshots; particles are continuous in space and time. Naïve bilinear interpolation of $\mathbf{u}$ is not divergence-free, which causes mass-conservation drift over long integrations. Use a C-grid-aware interpolator that reconstructs $\mathbf{u}$ consistently with the WRF staggered grid (operational STILT pattern). Document the chosen scheme on the integrator API; assume divergence preservation is load-bearing.

Time-stepping¶

Default to Euler–Maruyama with adaptive step bounded by

\Delta t \;<\; \min\!\left(\tau_L,\; \frac{\Delta x}{|\mathbf{u}|},\; \frac{\Delta x^{2}}{\sigma^{2}}\right).

((2))

Higher-order Milstein only if convergence diagnostics flag bias. $\Delta t$ is an open parameter — see open questions for the operational target.

Run modes¶

Forward mode: release $N$ particles from the source, track concentration by binning particle density on the analysis grid.
Backward mode (footprint): release receptors backward in time → source–receptor sensitivity matrix $\mathbf{F}$ . The FLEXPART/STILT paradigm and the workhorse of regional-scale inversions (Stohl et al., 2005).

Footprint definition (formal)¶

For a receptor $r$ and a candidate source cell $s$ :

F(r, s) \;=\; \int_{t_\text{obs} - T}^{t_\text{obs}} \frac{1}{V_s\, \rho_\text{air}(s,t)} \;\mathbb{1}\!\left[\, z_\text{part}(t) < f_\text{PBL}\, L(s, t)\, \right]\, \mathrm{d}t

((3))

— integrated over particle residence time below a fraction $f_\text{PBL} \approx 0.5$ of the local PBL height $L(s,t)$ . Units: $\text{s} \cdot \text{m}^{2} \cdot \text{kg}^{-1}$ (mixing-ratio per kg/m²/s of surface flux). The footprint is a probability density over candidate source cells, not an indicator: properly normalised by particle volume and air density. Numerical scaling matters because the values are tiny — work in log-space when storing.

For an overpass with $n_\text{obs}$ columns and $n_\text{grid}$ candidate source cells, $\mathbf{F}$ is shape $(n_\text{obs}, n_\text{grid})$ and typically 90%+ sparse.

(2) Model-based inference¶

Forward model with observation operator¶

The full forward used by inference is not $\mathbf{y} = \mathbf{F}\mathbf{q} + \boldsymbol{\varepsilon}$ . It’s:

\mathbf{y} \;=\; \mathbf{A}\, \mathrm{col}_z(\mathbf{F}\mathbf{q}) \;+\; \mathbf{c}_\text{bg} \;+\; \boldsymbol{\varepsilon}

((4))

$\mathbf{F}\mathbf{q}$ gives a 3D concentration field from the source vector.
$\mathrm{col}_z$ collapses to an XCH₄ column.
$\mathbf{A}$ applies the satellite averaging kernel; see ((1)).
$\mathbf{c}_\text{bg}$ is the regional background — mandatory, not optional. Same critique as Tier I: predicted enhancement vs. absolute-column observation must be reconciled day-1.

Likelihood model¶

\boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{R}), \qquad \mathbf{R} \;=\; \mathbf{R}_\text{retr} + \mathbf{R}_\text{repr}

((5))

$\mathbf{R}_\text{retr}$ — heteroscedastic per-pixel from the L2 retrieval-error map.
$\mathbf{R}_\text{repr}$ — representation error: model-vs-observation footprint mismatch. Typically 1–5 ppb diagonal addition; rises with terrain complexity and at coarse-instrument boundaries. Don’t omit — naive $\mathbf{R} = \mathbf{R}_\text{retr}$ overweights observations and produces overconfident posteriors.
Block-diagonal across overpasses; cross-overpass correlation only if observation times are within met decorrelation scale (~hours).

Prior on $\mathbf{q}$ — spatially correlated, sign-constrained¶

The prior is the regularizer; it’s the methodological choice in regional inversions.

Table (1):Spatial-prior choices for the source vector $\mathbf{q}$ .

Choice	Form	Notes
Mean $\mathbf{q}_a$	from emission inventory (Crippa et al., 2023Scarpelli et al., 2022Maasakkers et al., 2023)	prior median per cell
Covariance $\mathbf{B}$	Matérn-3/2 with correlation length $\ell \in [5, 50]$ km	smooth posterior; $\ell$ tuned by posterior diagnostics or hierarchical
Positivity	$\log \mathbf{q} \sim \mathcal{N}(\log \mathbf{q}_a, \mathbf{B}_{\log})$ (lognormal)	non-negative emissions; conjugate variant: NNLS / projected-gradient on Gaussian $\mathbf{q}$

Gaussian–Gaussian closed form (linear-in-log)¶

When the lognormal is linearised around $\log \mathbf{q}_a$ (fine for moderate enhancements over the prior), the posterior is closed form:

\mathbf{q}^* \;=\; \mathbf{q}_a \cdot \exp\!\left(\mathbf{B}_{\log} \tilde{\mathbf{F}}^{\top} (\tilde{\mathbf{F}} \mathbf{B}_{\log} \tilde{\mathbf{F}}^{\top} + \mathbf{R})^{-1} (\mathbf{y} - \tilde{\mathbf{F}} \mathbf{q}_a)\right)

((6))

\mathbf{P}^* \;=\; \mathbf{B}_{\log} \;-\; \mathbf{B}_{\log} \tilde{\mathbf{F}}^{\top} (\tilde{\mathbf{F}} \mathbf{B}_{\log} \tilde{\mathbf{F}}^{\top} + \mathbf{R})^{-1} \tilde{\mathbf{F}} \mathbf{B}_{\log}

((7))

with $\tilde{\mathbf{F}} = \operatorname{diag}(\mathbf{q}_a)\, \mathbf{F}$ (the linearised Jacobian). Identical structure to the equations gaussx is built around.

Scaling beyond moderate grids¶

“≲10k cells × ≲1k obs” is the dense-solver limit. Beyond that:
- Krylov + structure-aware solves via gaussx (Kronecker-Matérn, low-rank $\mathbf{F}$ ). Pushes direct solves to ~100k cells with sufficient sparsity.
- Ensemble Kalman Inversion (EKI): ensemble of forward trajectories → ensemble-based $\partial \mathbf{c} / \partial \mathbf{Q}$ . Plug into filterax; couple with vardaX for the variational version.
- MCMC over $\log \mathbf{q}(\mathbf{x})$ : expensive but exact. Use only when EKI is suspected biased.

(3) Model emulator¶

The Lagrangian model becomes expensive at large $N$ particles or when running ensembles for met-uncertainty propagation.

Footprint emulator: $(\text{met snapshot}, \text{receptor location}, \text{source-candidate grid}) \to \mathbf{F}(\text{receptor}, \cdot)$ . Receptor location is part of the input (not just source location). CNN if the met grid is regular; FNO if you want to be resolution-agnostic.
Trajectory emulator: replace the SDE integration with a neural ODE or normalising flow. Less common but potentially useful for backward integration.
Training distribution. Sample met conditions from the actual distribution at facility locations of interest, not a uniform climatology bin. A coarse $(\text{hour-of-day}, \text{season}, \text{stability})$ grid covers <10% of the operational regime probability mass; train on the conditional distribution $p(\text{met} \mid \text{facility}_\text{lat,lon}, t_\text{overpass})$ instead.

(4) Emulator-based inference¶

Replace $\mathbf{F}$ with the emulated footprint in the linear inversion. Enables:

Real-time source estimation during satellite overpass (no SDE integration in the loop).
Ensemble-based UQ at scale — sample met conditions, emulator gives footprint per sample, propagate to source posterior.
Validation: posterior from emulator-inversion ≈ posterior from full-Lagrangian inversion on the same observations. Diverging posteriors flag emulator bias before it becomes a downstream problem.

(5) Amortized inference (predictor)¶

f_\theta : (\mathbf{y}_\text{multiscale}, \text{met}_\text{context}, \text{instrument\_id}) \;\longmapsto\; p(\log \mathbf{q}(\mathbf{x}) \mid \mathbf{y}, \text{met})

((8))

Output grid commitment¶

$\mathbf{q}(\mathbf{x})$ is on the inversion grid (~1–10 km, basin-scoped). Predictor outputs a fixed-size 2D field per basin tile. Per-instrument predictors, dispatched by instrument_id, because TROPOMI (Veefkind et al., 2012, ~5 km), EMIT (Green & others, 2022, ~60 m), Tanager (Carbon Mapper, 2024, ~30 m) need different summary networks.

Multi-instrument observations¶

When fusing across instruments, each observation tensor carries its own AK and footprint at native resolution. The predictor consumes a list of $(\mathbf{y}, \mathbf{A}, \text{footprint}, \sigma_\text{retr}, \text{mask})$ per instrument and aggregates internally — don’t pre-regrid to a common resolution (loses information).

Conditional flow architecture¶

Posterior over $\log \mathbf{q}(\mathbf{x})$ is a spatial field — natural fit for a conditional flow over images.
gauss_flows is currently 1D-only
Either extend gauss_flows to 2D coupling layers (multi-month effort) or fall back to score-based / diffusion posterior. Score-based is the safer path for v1.
Context conditioning via FiLM / hypernet primitives in pyrox.nn — same pattern as Tier I.

(6) Improve¶

Multi-layer met fields. Move from 2D footprints to 3D trajectories through stacked WRF layers — necessary when emissions span the inversion layer or when the source PBL is poorly mixed.
Chemical loss during transport. $\mathrm{d}c/\mathrm{d}t = -k_\text{OH}\, c$ along trajectories. For CH₄ over <1 day, loss is negligible (~0.5%/day); for CO it’s significant.
Met uncertainty propagation. Run the trajectory ensemble across $N_\text{met}$ WRF / ERA5 ensemble realisations → uncertainty in $\mathbf{F}$ propagated through to source posterior. Hooks into the MetField.ensemble_dim from the prereqs schema.
Hierarchical $\mathbf{B}$ correlation length. Promote $\ell$ in the Matérn prior to a hyperparameter with its own posterior — let the data choose the regularisation scale.

Module layout (proposed)¶

Table (2):Tier II proposed module layout — step, concern, target module, status.

Step	Concern	Module	Status
1	Particle integrator (Markov-1)	`plume_simulation.lagrangian.particles`	☐
1	C-grid-aware wind interpolator	`plume_simulation.lagrangian.wind_interp`	☐
1	Backward footprint	`plume_simulation.lagrangian.footprint`	☐
1	Turbulence parameterisation ( $\sigma_{ij}, \tau_L$ )	`plume_simulation.lagrangian.turbulence`	☐
1	Column + AK pipeline	reuse `gauss_plume.observation` from Tier I	☐
2	Likelihoods + spatial priors	`plume_simulation.lagrangian.likelihoods`	☐
2	Linear inversion (Gaussian / lognormal)	reuse `assimilation/solve.py` with $\tilde{\mathbf{F}}$ injected	☐
2	Krylov / structure-aware solver	dispatch to `gaussx`	dependency
2	EKI	`filterax` (external)	dependency
2	Posterior export → Tier V	`plume_simulation.lagrangian.posterior_export`	☐
3	Footprint emulator	`plume_simulation.lagrangian.emulator`	☐
5	Field predictor (per instrument)	`plume_simulation.lagrangian.predictor`	☐
6	Met-ensemble runner	`plume_simulation.lagrangian.met_ensemble`	☐

The whole subpackage doesn’t exist yet; this is the proposed shape.

Validation strategy¶

Particle integration — zero-turbulence limit. With $\mathbf{b} \to \mathbf{0}$ , trajectories must follow streamlines exactly. Compare to streamline integration of the same wind field.
Mass conservation. In the no-deposition, closed-domain limit, total particle-seconds in the domain must be conserved to floating-point precision over the integration window. Standard LPDM regression test.
Adjoint–finite-difference. Verify backward ≡ adjoint of forward: perturb $q_i$ in the forward, measure $\Delta c_j$ , compare to $F[j, i]$ from the backward run. Should agree within Monte Carlo error. Cheap test, catches indexing / sign / time-direction bugs that are otherwise nightmare to track down.
Particle-count convergence. $N \to 2N$ should give converging posterior moments; unconverged means $N$ is too low. Sets the operational floor and replaces the open-question guess (10⁵ forward / 10³ backward) with measurement.
Footprint vs. forward agreement. For a single-source $\mathbf{q} = \mathbf{e}_i$ , the column $\mathbf{F}\mathbf{e}_i$ (forward) and the receptor row $\mathbf{F}[i, :]$ (backward) should agree. Catches indexing/orientation bugs distinct from the adjoint test.
Linear inversion against Tier I. In the limit of a single source, near-stationary winds, and known turbulence, the Lagrangian inversion should recover Tier I’s MAP estimate within posterior σ.
Real-data Permian benchmark. Invert TROPOMI + EMIT observations over the Permian for a published time window and compare to Lu et al. / Maasakkers et al. inverse-modelling estimates (see Maasakkers et al., 2023Jacob et al., 2022). Without this, the inversion is a synthetic exercise.
Emulator residual. $\lVert \mathbf{F}_\text{emu} - \mathbf{F}_\text{true} \rVert_F / \lVert \mathbf{F}_\text{true} \rVert_F < 5\%$ on a held-out met-condition set drawn from the operational distribution (not the training-bin distribution).

Open questions¶

References¶

Stohl, A., Forster, C., Frank, A., Seibert, P., & Wotawa, G. (2005). Technical note: The Lagrangian particle dispersion model FLEXPART version 6.2. Atmospheric Chemistry and Physics, 5(9), 2461–2474. 10.5194/acp-5-2461-2005
Stein, A. F., Draxler, R. R., Rolph, G. D., Stunder, B. J. B., Cohen, M. D., & Ngan, F. (2015). NOAA’s HYSPLIT atmospheric transport and dispersion modeling system. Bulletin of the American Meteorological Society, 96(12), 2059–2077. 10.1175/BAMS-D-14-00110.1
Thomson, D. J. (1987). Criteria for the selection of stochastic models of particle trajectories in turbulent flows. Journal of Fluid Mechanics, 180, 529–556. 10.1017/S0022112087001940
Wilson, J. D., & Sawford, B. L. (1996). Review of Lagrangian stochastic models for trajectories in the turbulent atmosphere. Boundary-Layer Meteorology, 78, 191–210. 10.1007/BF00122492
Monin, A. S., & Obukhov, A. M. (1954). Basic laws of turbulent mixing in the surface layer of the atmosphere. Tr. Akad. Nauk SSSR Geophiz. Inst., 24(151), 163–187.
Crippa, M., Guizzardi, D., Pagani, F., Banja, M., Muntean, M., Schaaf, E., & others. (2023). High resolution temporal profiles in the Emissions Database for Global Atmospheric Research (EDGAR). Scientific Data, 10, 636. 10.1038/s41597-023-02489-1
Scarpelli, T. R., Jacob, D. J., Grossman, S., Lu, X., Qu, Z., Sulprizio, M. P., Zhang, Y., Reuland, F., Gordon, D., & Worden, J. R. (2022). Updated Global Fuel Exploitation Inventory (GFEI) for methane emissions from the oil, gas, and coal sectors: Evaluation with inversions of atmospheric methane observations. Atmospheric Chemistry and Physics, 22(5), 3235–3249. 10.5194/acp-22-3235-2022
Maasakkers, J. D., Mcduffie, E. E., Sulprizio, M. P., Chen, C., Schultz, M., Brunelle, L., Thrush, R., Steller, J., Sherry, C., Jacob, D. J., & others. (2023). A gridded inventory of annual 2012-2018 U.S. anthropogenic methane emissions. Environmental Science & Technology, 57(43), 16276–16288. 10.1021/acs.est.3c05138
Veefkind, J. P., Aben, I., McMullan, K., Förster, H., de Vries, J., Otter, G., Claas, J., Eskes, H. J., de Haan, J. F., Kleipool, Q., & others. (2012). TROPOMI on the ESA Sentinel-5 Precursor: a GMES mission for global observations of the atmospheric composition for climate, air quality and ozone layer applications. Remote Sensing of Environment, 120, 70–83.
Green, R. O., & others. (2022). EMIT: Earth Surface Mineral Dust Source Investigation. https://earth.jpl.nasa.gov/emit/
Carbon Mapper. (2024). Carbon Mapper: airborne and satellite imaging spectroscopy for greenhouse gas monitoring. https://carbonmapper.org/
Jacob, D. J., Varon, D. J., Cusworth, D. H., Dennison, P. E., Frankenberg, C., Gautam, R., Guanter, L., Kelley, J., McKeever, J., Ott, L. E., Poulter, B., & others. (2022). Quantifying methane emissions from the global scale down to point sources using satellite observations of atmospheric methane. Atmospheric Chemistry and Physics, 22(14), 9617–9646. 10.5194/acp-22-9617-2022