Forward model: stochastic particle trajectories driven by wind + turbulence. The bridge between the analytical Tier I (no real wind variability) and the full PDE Tier III (no statistical efficiency). This is what FLEXPART (Stohl et al., 2005) and STILT do operationally — see also HYSPLIT (Stein et al., 2015).
This tier is not yet started in plume_simulation — module layout proposed below.
(1) Simple model¶
Stochastic dynamics — Markov-1 Langevin¶
Operational LPDMs (FLEXPART, STILT, HYSPLIT) use Markov-1: a Langevin equation on particle velocity, then position from velocity. We commit to Markov-1 because it’s required for near-source super-emitter work and for non-stationary turbulence:
- — mean wind from the met field (WRF / ERA5 / HRRR).
- — turbulent velocity perturbation, the actual stochastic state.
- — drift and diffusion coefficients constructed from the Reynolds-stress tensor and Lagrangian timescale . Reference: Thomson, 1987, Wilson & Sawford, 1996 (well-mixed condition).
- come from MO similarity (Monin & Obukhov, 1954) parameterised by plus WRF TKE — see MO prereqs.
Sub-grid wind interpolation¶
WRF velocities are gridded at hourly snapshots; particles are continuous in space and time. Naïve bilinear interpolation of is not divergence-free, which causes mass-conservation drift over long integrations. Use a C-grid-aware interpolator that reconstructs consistently with the WRF staggered grid (operational STILT pattern). Document the chosen scheme on the integrator API; assume divergence preservation is load-bearing.
Time-stepping¶
Default to Euler–Maruyama with adaptive step bounded by
Higher-order Milstein only if convergence diagnostics flag bias. is an open parameter — see open questions for the operational target.
Run modes¶
- Forward mode: release particles from the source, track concentration by binning particle density on the analysis grid.
- Backward mode (footprint): release receptors backward in time → source–receptor sensitivity matrix . The FLEXPART/STILT paradigm and the workhorse of regional-scale inversions (Stohl et al., 2005).
Footprint definition (formal)¶
For a receptor and a candidate source cell :
— integrated over particle residence time below a fraction of the local PBL height . Units: (mixing-ratio per kg/m²/s of surface flux). The footprint is a probability density over candidate source cells, not an indicator: properly normalised by particle volume and air density. Numerical scaling matters because the values are tiny — work in log-space when storing.
For an overpass with columns and candidate source cells, is shape and typically 90%+ sparse.
(2) Model-based inference¶
Forward model with observation operator¶
The full forward used by inference is not . It’s:
- gives a 3D concentration field from the source vector.
- collapses to an XCH₄ column.
- applies the satellite averaging kernel; see ((1)).
- is the regional background — mandatory, not optional. Same critique as Tier I: predicted enhancement vs. absolute-column observation must be reconciled day-1.
Likelihood model¶
- — heteroscedastic per-pixel from the L2 retrieval-error map.
- — representation error: model-vs-observation footprint mismatch. Typically 1–5 ppb diagonal addition; rises with terrain complexity and at coarse-instrument boundaries. Don’t omit — naive overweights observations and produces overconfident posteriors.
- Block-diagonal across overpasses; cross-overpass correlation only if observation times are within met decorrelation scale (~hours).
Prior on — spatially correlated, sign-constrained¶
The prior is the regularizer; it’s the methodological choice in regional inversions.
Table (1):Spatial-prior choices for the source vector .
| Choice | Form | Notes |
|---|---|---|
| Mean | from emission inventory (Crippa et al., 2023Scarpelli et al., 2022Maasakkers et al., 2023) | prior median per cell |
| Covariance | Matérn-3/2 with correlation length km | smooth posterior; tuned by posterior diagnostics or hierarchical |
| Positivity | (lognormal) | non-negative emissions; conjugate variant: NNLS / projected-gradient on Gaussian |
Gaussian–Gaussian closed form (linear-in-log)¶
When the lognormal is linearised around (fine for moderate enhancements over the prior), the posterior is closed form:
with (the linearised Jacobian). Identical structure to the equations gaussx is built around.
Scaling beyond moderate grids¶
- “≲10k cells × ≲1k obs” is the dense-solver limit. Beyond that:
- Krylov + structure-aware solves via
gaussx(Kronecker-Matérn, low-rank ). Pushes direct solves to ~100k cells with sufficient sparsity. - Ensemble Kalman Inversion (EKI): ensemble of forward trajectories → ensemble-based . Plug into
filterax; couple withvardaXfor the variational version. - MCMC over : expensive but exact. Use only when EKI is suspected biased.
- Krylov + structure-aware solves via
(3) Model emulator¶
The Lagrangian model becomes expensive at large particles or when running ensembles for met-uncertainty propagation.
- Footprint emulator: . Receptor location is part of the input (not just source location). CNN if the met grid is regular; FNO if you want to be resolution-agnostic.
- Trajectory emulator: replace the SDE integration with a neural ODE or normalising flow. Less common but potentially useful for backward integration.
- Training distribution. Sample met conditions from the actual distribution at facility locations of interest, not a uniform climatology bin. A coarse grid covers <10% of the operational regime probability mass; train on the conditional distribution instead.
(4) Emulator-based inference¶
Replace with the emulated footprint in the linear inversion. Enables:
- Real-time source estimation during satellite overpass (no SDE integration in the loop).
- Ensemble-based UQ at scale — sample met conditions, emulator gives footprint per sample, propagate to source posterior.
- Validation: posterior from emulator-inversion ≈ posterior from full-Lagrangian inversion on the same observations. Diverging posteriors flag emulator bias before it becomes a downstream problem.
(5) Amortized inference (predictor)¶
Output grid commitment¶
is on the inversion grid (~1–10 km, basin-scoped). Predictor outputs a fixed-size 2D field per basin tile. Per-instrument predictors, dispatched by instrument_id, because TROPOMI (Veefkind et al., 2012, ~5 km), EMIT (Green & others, 2022, ~60 m), Tanager (Carbon Mapper, 2024, ~30 m) need different summary networks.
Multi-instrument observations¶
When fusing across instruments, each observation tensor carries its own AK and footprint at native resolution. The predictor consumes a list of per instrument and aggregates internally — don’t pre-regrid to a common resolution (loses information).
Conditional flow architecture¶
- Posterior over is a spatial field — natural fit for a conditional flow over images.
- Context conditioning via FiLM / hypernet primitives in
pyrox.nn— same pattern as Tier I.
(6) Improve¶
- Multi-layer met fields. Move from 2D footprints to 3D trajectories through stacked WRF layers — necessary when emissions span the inversion layer or when the source PBL is poorly mixed.
- Chemical loss during transport. along trajectories. For CH₄ over <1 day, loss is negligible (~0.5%/day); for CO it’s significant.
- Met uncertainty propagation. Run the trajectory ensemble across WRF / ERA5 ensemble realisations → uncertainty in propagated through to source posterior. Hooks into the
MetField.ensemble_dimfrom the prereqs schema. - Hierarchical correlation length. Promote in the Matérn prior to a hyperparameter with its own posterior — let the data choose the regularisation scale.
Module layout (proposed)¶
Table (2):Tier II proposed module layout — step, concern, target module, status.
| Step | Concern | Module | Status |
|---|---|---|---|
| 1 | Particle integrator (Markov-1) | plume_simulation.lagrangian.particles | ☐ |
| 1 | C-grid-aware wind interpolator | plume_simulation.lagrangian.wind_interp | ☐ |
| 1 | Backward footprint | plume_simulation.lagrangian.footprint | ☐ |
| 1 | Turbulence parameterisation () | plume_simulation.lagrangian.turbulence | ☐ |
| 1 | Column + AK pipeline | reuse gauss_plume.observation from Tier I | ☐ |
| 2 | Likelihoods + spatial priors | plume_simulation.lagrangian.likelihoods | ☐ |
| 2 | Linear inversion (Gaussian / lognormal) | reuse assimilation/solve.py with injected | ☐ |
| 2 | Krylov / structure-aware solver | dispatch to gaussx | dependency |
| 2 | EKI | filterax (external) | dependency |
| 2 | Posterior export → Tier V | plume_simulation.lagrangian.posterior_export | ☐ |
| 3 | Footprint emulator | plume_simulation.lagrangian.emulator | ☐ |
| 5 | Field predictor (per instrument) | plume_simulation.lagrangian.predictor | ☐ |
| 6 | Met-ensemble runner | plume_simulation.lagrangian.met_ensemble | ☐ |
The whole subpackage doesn’t exist yet; this is the proposed shape.
Validation strategy¶
- Particle integration — zero-turbulence limit. With , trajectories must follow streamlines exactly. Compare to streamline integration of the same wind field.
- Mass conservation. In the no-deposition, closed-domain limit, total particle-seconds in the domain must be conserved to floating-point precision over the integration window. Standard LPDM regression test.
- Adjoint–finite-difference. Verify backward ≡ adjoint of forward: perturb in the forward, measure , compare to from the backward run. Should agree within Monte Carlo error. Cheap test, catches indexing / sign / time-direction bugs that are otherwise nightmare to track down.
- Particle-count convergence. should give converging posterior moments; unconverged means is too low. Sets the operational floor and replaces the open-question guess (10⁵ forward / 10³ backward) with measurement.
- Footprint vs. forward agreement. For a single-source , the column (forward) and the receptor row (backward) should agree. Catches indexing/orientation bugs distinct from the adjoint test.
- Linear inversion against Tier I. In the limit of a single source, near-stationary winds, and known turbulence, the Lagrangian inversion should recover Tier I’s MAP estimate within posterior σ.
- Real-data Permian benchmark. Invert TROPOMI + EMIT observations over the Permian for a published time window and compare to Lu et al. / Maasakkers et al. inverse-modelling estimates (see Maasakkers et al., 2023Jacob et al., 2022). Without this, the inversion is a synthetic exercise.
- Emulator residual. on a held-out met-condition set drawn from the operational distribution (not the training-bin distribution).
Open questions¶
- Stohl, A., Forster, C., Frank, A., Seibert, P., & Wotawa, G. (2005). Technical note: The Lagrangian particle dispersion model FLEXPART version 6.2. Atmospheric Chemistry and Physics, 5(9), 2461–2474. 10.5194/acp-5-2461-2005
- Stein, A. F., Draxler, R. R., Rolph, G. D., Stunder, B. J. B., Cohen, M. D., & Ngan, F. (2015). NOAA’s HYSPLIT atmospheric transport and dispersion modeling system. Bulletin of the American Meteorological Society, 96(12), 2059–2077. 10.1175/BAMS-D-14-00110.1
- Thomson, D. J. (1987). Criteria for the selection of stochastic models of particle trajectories in turbulent flows. Journal of Fluid Mechanics, 180, 529–556. 10.1017/S0022112087001940
- Wilson, J. D., & Sawford, B. L. (1996). Review of Lagrangian stochastic models for trajectories in the turbulent atmosphere. Boundary-Layer Meteorology, 78, 191–210. 10.1007/BF00122492
- Monin, A. S., & Obukhov, A. M. (1954). Basic laws of turbulent mixing in the surface layer of the atmosphere. Tr. Akad. Nauk SSSR Geophiz. Inst., 24(151), 163–187.
- Crippa, M., Guizzardi, D., Pagani, F., Banja, M., Muntean, M., Schaaf, E., & others. (2023). High resolution temporal profiles in the Emissions Database for Global Atmospheric Research (EDGAR). Scientific Data, 10, 636. 10.1038/s41597-023-02489-1
- Scarpelli, T. R., Jacob, D. J., Grossman, S., Lu, X., Qu, Z., Sulprizio, M. P., Zhang, Y., Reuland, F., Gordon, D., & Worden, J. R. (2022). Updated Global Fuel Exploitation Inventory (GFEI) for methane emissions from the oil, gas, and coal sectors: Evaluation with inversions of atmospheric methane observations. Atmospheric Chemistry and Physics, 22(5), 3235–3249. 10.5194/acp-22-3235-2022
- Maasakkers, J. D., Mcduffie, E. E., Sulprizio, M. P., Chen, C., Schultz, M., Brunelle, L., Thrush, R., Steller, J., Sherry, C., Jacob, D. J., & others. (2023). A gridded inventory of annual 2012-2018 U.S. anthropogenic methane emissions. Environmental Science & Technology, 57(43), 16276–16288. 10.1021/acs.est.3c05138
- Veefkind, J. P., Aben, I., McMullan, K., Förster, H., de Vries, J., Otter, G., Claas, J., Eskes, H. J., de Haan, J. F., Kleipool, Q., & others. (2012). TROPOMI on the ESA Sentinel-5 Precursor: a GMES mission for global observations of the atmospheric composition for climate, air quality and ozone layer applications. Remote Sensing of Environment, 120, 70–83.
- Green, R. O., & others. (2022). EMIT: Earth Surface Mineral Dust Source Investigation. https://earth.jpl.nasa.gov/emit/
- Carbon Mapper. (2024). Carbon Mapper: airborne and satellite imaging spectroscopy for greenhouse gas monitoring. https://carbonmapper.org/
- Jacob, D. J., Varon, D. J., Cusworth, D. H., Dennison, P. E., Frankenberg, C., Gautam, R., Guanter, L., Kelley, J., McKeever, J., Ott, L. E., Poulter, B., & others. (2022). Quantifying methane emissions from the global scale down to point sources using satellite observations of atmospheric methane. Atmospheric Chemistry and Physics, 22(14), 9617–9646. 10.5194/acp-22-9617-2022