Tier V — Source population & forecasting

Forward model: thinned marked temporal point process (TMTPP) over emission events, with per-event marks drawn from Tier I–IV posteriors and per-event detection thinning by per-satellite POD models.

This tier sits above the per-event physics tiers. Tiers I–IV answer “what’s the emission rate from this plume right now?” — single overpass, single source. Tier V answers a different family of questions:

Inventory accounting: “Given a population of detected plumes (and the ones we missed), what’s the true total emitted mass?”
Forecasting: “For a given facility class, when will the next emission event happen, and how big will it be?”
Bias diagnosis: “How biased is the per-overpass average rate when satellites only see the big leaks?”

Inventory and forecasting are co-equal products of Tier V — not just totals. The inverted intensity $\lambda(t)$ directly powers operational forecasts (dispatch windows, occurrence probabilities); see Persistency.

Sub-pages:

Instantaneous emission estimation — single-overpass $Q$ ; the cross-tier interface that turns per-event posteriors into mark likelihoods.
Point process model (TMTPP) — the generative foundation: temporal intensity $\lambda(t)$ , marks $f(Q)$ , and detection thinning $P_d(\cdot)$ .
Persistency — operational forecasts from inverted $\lambda(t)$ : wait times, dispatch windows, occurrence probabilities.
Total emission estimation — the missing-mass paradox and POD-corrected regional/national totals.

TMTPP foundations — the three-term log-likelihood¶

The full population log-likelihood has three terms (derived in 06b; foundations in Daley & Vere-Jones, 2003Daley & Vere-Jones, 2008):

\log L \;=\; \underbrace{\sum_{i \in \mathcal{D}} \log p(\text{detected}_i \mid f, \lambda, P_d)}_{\text{mark contribution}} \;+\; \underbrace{\sum_{i \in \mathcal{D}} \log \lambda(t_i)}_{\text{detection-time intensity}} \;-\; \underbrace{\int_{0}^{T} \lambda(t) \left[ \int P_d(Q)\, f(Q)\, \mathrm{d}Q \right] \mathrm{d}t}_{\text{integrated thinned rate}}

((1))

The third term is what makes λ and $P_d$ jointly identifiable — without it the two trade off.

Mark contribution and the soft-observation framing¶

The per-event posterior from Tiers I–IV is a soft observation of the (unknown) true mark $Q_i$ . This is the same Bayesian-deconvolution / errors-in-variables structure used in measurement-error regression. The per-event mark contribution is:

p(\text{detected}_i \mid f, \lambda, P_d) \;=\; \int P_d(Q)\, L_i(Q)\, f(Q)\, \mathrm{d}Q

((2))

where $L_i(Q) = p(\text{observation}_i \mid Q)$ is the per-event likelihood, not the posterior. In sample-based practice (per-event posterior samples $Q_i^{(s)} \sim p(Q \mid \text{observation}_i)$ ):

p(\text{detected}_i \mid f, \lambda, P_d) \;\approx\; \frac{1}{S}\, \sum_{s=1}^{S}\, P_d(Q_i^{(s)})\, \frac{f(Q_i^{(s)})}{\pi_\text{per-event}(Q_i^{(s)})}

((3))

with $\pi_\text{per-event}(Q)$ the per-event prior used at Tier I–IV. The ratio $f / \pi_\text{per-event}$ is the importance weight that re-points the per-event posterior at the population mark distribution.

This is the central math of cross-tier inference. Currently the prototype in methane_pod.fitting summarises per-event posteriors to point estimates before the population fit, side-stepping the importance correction. Formalising this is the v1 deliverable for 06a_instantaneous.md.

How the cycle adapts at population scale¶

The six-step cycle still applies, but the objects change:

Table (1):Six-step cycle adaptation: Tier I–IV (per event) vs. Tier V (population).

Step	Tier I–IV (per event)	Tier V (population)
1 — Simple model	Forward physics (plume / PDE / RTM)	Generative TMTPP: $\lambda(t)$ + mark $f(Q)$ + POD $P_d(\cdot)$
2 — Model-based inference	MAP / MCMC over source params	NumPyro NUTS over $(\lambda \text{ params}, \text{mark params}, \text{POD params})$ . Cheap at $O(10^{4})$ events (minutes); hours-to-days at $O(10^{6})$ events (national catalog)
3 — Model emulator	FNO / neural ODE on the PDE	Skip when NUTS fits in budget. Optionally a normalising flow over the population posterior for repeated re-fits or sensitivity studies
4 — Emulator-based inference	PDE-free 4D-Var	Variational fit (`numpyro.infer.SVI`) or flow-based posterior approximation; required at national catalog scale
5 — Amortized predictor	Per-overpass $Q$ predictor	$(\text{basin tile}, \text{history window}) \to$ posterior over $(\lambda, f(Q), \text{total mass}, \text{next-event time})$ conditioned on per-event evidence and met-region context
6 — Improve	Better physics	Spatial point process (links to Tier III); multi-satellite fusion; varying-coefficient POD (per-(basin, season, scene class) hierarchy); non-Poisson clustering (Hawkes / Cox)

Tile definition: an H3 hex-resolution-7 cell (~5 km²) for sub-basin work, or a basin polygon for inventory accounting. History window: 30–365 days, hierarchical prior on the cutoff.

Varying-coefficient POD: $P_d$ parameters indexed by $(\text{basin}, \text{season}, \text{scene class})$ with hierarchical shrinkage to the global POD. Captures regional / seasonal detection differences without inflating parameter count.

Cross-tier interface — the load-bearing contract¶

Payload schema¶

Every per-event posterior consumed by Tier V must carry:

Table (2):Per-event posterior payload — fields, types, notes.

Field	Type	Notes
`posterior_samples`	`(S,)` array of $Q$ draws	OR `posterior_summary` for Gaussian shorthand
`posterior_summary`	$(\mu_{\log Q}, \sigma_{\log Q})$	lognormal quick form when full samples are too heavy
`per_event_prior_logpdf`	callable $Q \to \log \pi(Q)$	required for the importance correction; without it the population fit is biased
`instrument_id`	str	dispatch into per-instrument POD
`t_detection`	float (UTC seconds)	for $\lambda(t)$
`x0_posterior`	$(\boldsymbol{\mu}_{xy}, \boldsymbol{\Sigma}_{xy})$	for spatial Cox-process upgrade
`quality`	dict	confidence flags from the Tier I–IV quality bitmask

Independence assumption — the v1 caveat¶

The factorised likelihood above assumes detections at different overpasses are independent. Two overpasses of the same physical leak (e.g. GHGSat then TROPOMI two days later) violate this.

Validation strategy¶

Population SBC. Generate $(\lambda^{*}, f^{*}, P_d^{*})$ , simulate the full thinned-and-marked catalog with a synthetic per-event posterior layer, fit, check rank statistics across all hyperparameters. The Tier V analogue of Tier-I synthetic recovery.
Importance-weight ESS diagnostic. Per detection $i$ , the IS estimator’s effective sample size $\operatorname{ESS}_i = (\sum_s w_s)^{2} / \sum_s w_s^{2}$ is a health metric. Low ESS (e.g. $< S/10$ ) signals that the per-event posterior is far from the population mark distribution; the population fit is unreliable for that event. Report the ESS distribution as a fit diagnostic.
Per-event-prior swap-out. Refit the population using a different per-event prior at Tier I–IV (re-run Tiers I–IV with $\pi_\text{per-event} = \operatorname{LogNormal}(0, 2)$ instead of the inventory-anchored prior). The population posterior on $f$ should not move beyond IS noise. If it does, the importance correction is mis-implemented.
Real-data benchmark. Compare corrected total emission for a well-studied basin (Permian) to published bottom-up inventories (U.S. Environmental Protection Agency, 2024Scarpelli et al., 2020, GHGRP) and top-down inverse-modelling estimates (Maasakkers et al., 2023Jacob et al., 2022, Sherwin et al.) — see 06d.

Module layout — depend on `methane_pod`, don’t absorb it¶

plumax depends on the standalone methane_pod package (pinned methane_pod >= 0.1, < 0.2 for v1); the population-scale code is not re-implemented. Rationale:

methane_pod has its own audience (point-process methodologists), test suite, release cadence.
plumax consumes it through a thin adapter that materialises Tier I–IV posteriors as inputs to methane_pod.fitting.
Versioning stays clean — when methane_pod releases v0.X.Y, plumax pins to a known-good version.

Table (3):Tier V module layout — concern, target module, status.

Concern	Module	Status
Intensity registry $\lambda(t)$	`methane_pod.intensity`	library ✓ (13 kernels)
POD registry $P_d(\cdot)$	`methane_pod.pod_functions`	library ✓ (10 models)
Missing-mass MC simulator	`methane_pod.paradox`	library ✓
NUTS fitter	`methane_pod.fitting`	library ✓; importance-correction integration ☐
Per-event posterior summariser	`plume_simulation.population.adapter.summariser`	☐
Per-event prior recall ( $\pi_\text{per-event}$ lookup)	`plume_simulation.population.adapter.prior_recall`	☐ — required for importance weighting
Importance-weight calculator	`plume_simulation.population.adapter.importance`	☐
Multi-satellite POD union	`plume_simulation.population.adapter.pod_union`	☐
Catalog schema (CSV / parquet)	`plume_simulation.population.adapter.schema`	☐
Real-data CSV ingestion	`plume_simulation.population.ingest`	☐ (placeholder in `07_pod_fitting_mcmc.md`)
Population SBC harness	`plume_simulation.population.validation.sbc`	☐
Importance-weight ESS diagnostic	`plume_simulation.population.validation.iw_ess`	☐
Per-event-prior swap-out test	`plume_simulation.population.validation.prior_swap`	☐
Spatial Cox-process extension (v2)	`plume_simulation.population.spatial`	☐

A plume_simulation.population subpackage doesn’t exist yet; this is the proposed shape.

Connection to Tier III — spatial structure¶

Tier III’s distributed source field $S(\mathbf{x},t)$ is exactly a spatial inhomogeneous Poisson rate at the population level — temporally aggregated, this is the spatial intensity of a Cox process over emission events. The v2 spatial extension of Tier V is the same mathematical object Tier III already inverts at the per-event timescale, just averaged over a longer horizon. The two tiers should share the parameterisation: a Matérn GP prior on $\log S(\mathbf{x},t)$ plays the role of both Tier III’s source-field prior and Tier V.v2’s spatial Cox-process intensity.

This isn’t a coincidence — it’s why plumax’s tier structure works: the same mathematical objects appear at different scales.

Status snapshot¶

Theory. TMTPP foundations and the missing-mass paradox are written up in methane_pod/notebooks/01_mttpp_theory and 03_missing_mass_paradox.
methane_pod library: ✓ — intensity, POD, paradox simulator, NUTS fitter all implemented.
Cross-tier integration: ☐ — per-event posteriors enter the population fit as point estimates (no importance correction). Tier V’s main code deliverable.
Synthetic validation. 06_stationary_numpyro_mcmc recovers POD parameters on synthetic data without the soft-observation layer.
Real-data fit. 07_pod_fitting_mcmc is a placeholder; needs IMEO + Tanager CSV ingestion.

Open questions¶

References¶

Daley, D. J., & Vere-Jones, D. (2003). An Introduction to the Theory of Point Processes, Volume I: Elementary Theory and Methods (2nd ed.). Springer. 10.1007/b97277
Daley, D. J., & Vere-Jones, D. (2008). An Introduction to the Theory of Point Processes, Volume II: General Theory and Structure (2nd ed.). Springer. 10.1007/978-0-387-49835-5
U.S. Environmental Protection Agency. (2024). Inventory of U.S. Greenhouse Gas Emissions and Sinks: 1990–2022. EPA 430-R-24-004. https://www.epa.gov/ghgemissions/inventory-us-greenhouse-gas-emissions-and-sinks
Scarpelli, T. R., Jacob, D. J., Maasakkers, J. D., Sulprizio, M. P., Sheng, J.-X., Rose, K., Romeo, L., Worden, J. R., & Janssens-Maenhout, G. (2020). A global gridded (0.1° × 0.1°) inventory of methane emissions from oil, gas, and coal exploitation based on national reports to the United Nations Framework Convention on Climate Change. Earth System Science Data, 12(1), 563–575. 10.5194/essd-12-563-2020
Maasakkers, J. D., Mcduffie, E. E., Sulprizio, M. P., Chen, C., Schultz, M., Brunelle, L., Thrush, R., Steller, J., Sherry, C., Jacob, D. J., & others. (2023). A gridded inventory of annual 2012-2018 U.S. anthropogenic methane emissions. Environmental Science & Technology, 57(43), 16276–16288. 10.1021/acs.est.3c05138
Jacob, D. J., Varon, D. J., Cusworth, D. H., Dennison, P. E., Frankenberg, C., Gautam, R., Guanter, L., Kelley, J., McKeever, J., Ott, L. E., Poulter, B., & others. (2022). Quantifying methane emissions from the global scale down to point sources using satellite observations of atmospheric methane. Atmospheric Chemistry and Physics, 22(14), 9617–9646. 10.5194/acp-22-9617-2022