Skip to article frontmatterSkip to article content

TLDR: We will explore some staple data likelihoods for modeling extreme values. We can include some standard distributions like the Normal and LogNormal, more long-tail distributions like the T-Student and Stable Distribution, and also more extreme value specific distributions like GEVD and GPD. We will directly find the best parameters of these distributions given observations of extreme values . We will explore various methods for extracting extreme values from spatiotemporal data like block maxima method or the peak-over-threshold (POT) method. We will look at advanced inference methods using the numpyro library to estimate the parameters. We will explore different analysis techniques like the quality of data fit (p-p plot, q-q plot, PDFs, CDFs) and return periods to make predictions about unseen extremes. We will also do an exploratory analysis to characterize the autocorrelations in space and time. In addition, we will explore different standardized ad-hoc feature extraction techniques to extract IID data, i.e., the residuals. Some examples include removing trends like the climatology, appropriate time scale aggregation, appropriate spatial scale aggregation, filtering and differencing.


Table of Contents

  • Deep Dive: Extreme Value Theory
  • Calculating Extreme Values for Spatiotemporal Data
  • Feature Extraction for Spatiotemporal Dependencies
  • Bayesian Modeling of Extreme Values
  • Bayesian Hierarchical Modeling of Extreme Values
  • Analysis of Extreme Values

Abstractions

Data-Driven Modeling Pipeline

  1. Data Acquisition
  2. Feature Extraction
  3. Extreme Values Extraction
  4. Bayesian Model
  5. Bayesian Inference
  6. Analysis

Extreme Value Extractor Pipeline

  1. Define spatiotemporal block
  2. Select a threshold
  3. Select Minimum/Maximum Values
  4. Count the number occurrences of the values above a threshold within a spatiotemporal block
  5. Summary statistic of occurrences, eg mean, median, etc

Proposal

What’s Been Done. The original paper Philip et al, 2020) looks at some of the standard probability distributions to model observation data; in particular, they look at temperature and precipitation. They looked at standard distributions like the Normal distribution as well as long tail distributions like the T-Student distribution. The most interesting was their investigation of extreme value distributions like the GEVD and the GPD. There is also a wealth of examples available at (Coles, 2001) which takes a more step-by-step approach.

What’s Lacking. Many papers do not show visualizations or intuitions for the uncertainty within parameters. They also don’t provide any practical insights into which inference methods work best and why. In addition, there is no consistent software in the python community that is understandable, modular, and extensible.

Data Representation. We will use the simplest data representation possible. We will spatially aggregate the observations, yy, to get a time series. Then, we will assume the time series is IID.

D={yn}n=1N,ynRDy,N=Nt \begin{aligned} \mathcal{D} = \{ y_n \}_{n=1}^N, && && y_n\in\mathbb{R}^{D_y}, && && N = N_t \end{aligned}

To extract extremes, we will look at the two methods: 1) block maxima and 2) peak-over-threshold. To combat the iid assumption, we will do a wide range of preprocessing steps that are common within the climate community, e.g., filtering, removing the climatology, and other aggregation strategies.

Model Formulation. We will use the standard Bayesian approach whereby we fit the observations to some staple distributions for extreme values.

Data Likelihood:yp(yθ)Prior:θp(θ) \begin{aligned} \text{Data Likelihood}: && && \boldsymbol{y} &\sim p(\boldsymbol{y}|\boldsymbol{\theta}) \\ \text{Prior}: && && \boldsymbol{\theta} &\sim p(\boldsymbol{\theta}) \end{aligned}

There will be using a data likelihood for the extremes and a prior set of parameters for the likelihood distribution, e.g., the mean, μ, the scale, σ, and the shape, ξ.

Proposed Improvements. We can add a short list of some Bayesian inference methods on these parametric models where we give alternative inference methods, e.g., MLE, MAP, Laplace Approximation, Variational Inference (Wingate & Weber, 2013, Ranganath et al, 2013), MCMC, and HMC. This will give users more uncertainty quantification of the parameters found. We will use modern PPLs which will give users speed and reasonable scalability. We will demonstrate how easy it is to implement simple ideas via well done code. We will also uphold strict reproducibility standards so that the broader community can engage in further developments of modeling extreme values using modern ML-inspired tools.


Data Structure Assumptions

Recall, we are dealing with a spatiotemporal field given by

y=y(x,t)y:RDs×R+RDyxΩyRDstTR+\begin{aligned} \boldsymbol{y} = \boldsymbol{y}(\mathbf{x},t) && && \boldsymbol{y}:\mathbb{R}^{D_s}\times\mathbb{R}^+\rightarrow\mathbb{R}^{D_y} && && \mathbf{x}\in\Omega_y\subseteq\mathbb{R}^{D_s} && && t\in\mathcal{T}\subseteq\mathbb{R}^+ \end{aligned}

where x\mathbf{x} are the spatial coordinates and tt is the temporal coordinate.


I.I.D. Samples

This is the simplest form where we assume each datapoint is independent, i.e., no spatial or temporal dependencies. When given a:

  • Time Series: this means that we assume there are no temporal correlations.
  • Spatial Field - this means that we assume that there are no spatial correlations.
  • SpatioTemporal Field - we assume there are no spatial AND temporal correlations.

So the dataset will be:

D={yn}n=1NynRDy\mathcal{D} = \{ \boldsymbol{y}_n \}_{n=1}^{N} \hspace{10mm} \boldsymbol{y}_n\in\mathbb{R}^{D_y}

where N=NsNtN=N_s N_t are the total number of points in the spatiotemporal datacube. When given a spatiotemporal datacube, this can be achieved through spatiotemporal aggregations, e.g., The reasoning is be


Parametric Model

We assume we have a data likelihood given by:

yp(yθ)\boldsymbol{y} \sim p(\boldsymbol{y}|\boldsymbol{\theta})

This distribution could be a Gaussian,a GEVD or a Pareto distribution. We are interested in finding the best parameters, θ\boldsymbol{\theta} given the observations, y\boldsymbol{y}, i.e., the posterior. The full Bayesian posterior can be written as

p(θy)=1Zp(yθ)p(θ)p(\boldsymbol{\theta}|\boldsymbol{y}) = \frac{1}{Z}p(\boldsymbol{y}|\boldsymbol{\theta})p(\boldsymbol{\theta})

where ZZ is a normalization constant. We can use any inference technique including approximate inference methods or sampling methods.


Model

First, we will need to provide some priors for the variables. We will use uninformative priors for each of the parameters. However, we will constrain the scale and degress-of-freedom parameter, σ,ν\sigma,\nu, to be positive only.

# prior location parameter 
loc: Array[""] = numpyro.sample("loc", dist.Normal(0.0, 1.0)) 
scale: Array[""] = numpyro.sample("scale", dist.LogNormal(0.0, 10.0))
df: Array[""] = numpyro.sample("df", dist.LogNormal(0.0, 10.0))

We also need a data likelihood

y: Array[""] = numpyro.sample("y", dist.StudentT(df=df, loc=loc, scale=scale), obs=y)

Inference

Prior Predictive Distribution

Prior Samples:αnp(α)Data Likelihood:yp(yα)\begin{aligned} \text{Prior Samples}: && && \boldsymbol{\alpha}_n &\sim p(\boldsymbol{\alpha}) \\ \text{Data Likelihood}: && && y &\sim p(y|\boldsymbol{\alpha}) \end{aligned}
prior_predictive = Predictive(model, num_sample=100)
prior

Perform Inference

kernel = NUTS(model)
num_samples = 2_000
mcmc = MCMC(kernel, num_warmup=1_000, num_samples=num_samples)
mcmc.run(key, y=y)
mcmc.print_summary()
samples = mcmc.get_samples()

Posterior Distribution over Regression Parameters

posterior_mu = samples["weight"] * u + samples["bias"]
mean_ = jnp.mean(posterior_mu, axis=0)
hpdi_mu = hpdi(posterior_mu, quantile=0.90)

Predictive Posterior Distribution

Posterior Samples:αnp(αy)Data Likelihood:ynp(ynαn)\begin{aligned} \text{Posterior Samples}: && && \boldsymbol{\alpha}_n &\sim p(\boldsymbol{\alpha}|y) \\ \text{Data Likelihood}: && && y_n^* &\sim p(y^*_n|\boldsymbol{\alpha}_n) \end{aligned}
def predict(rng, post_samples, model, *args, **kwargs):
  model = handlers.seed(handlers.condition(model, post_samples), rng)
  model_trace = handlers.trace(model).get_trace(*args, **kwargs)
  return model_trace["y"]["value"]
References
  1. Philip, S., Kew, S., van Oldenborgh, G. J., Otto, F., Vautard, R., van der Wiel, K., King, A., Lott, F., Arrighi, J., Singh, R., & van Aalst, M. (2020). A protocol for probabilistic extreme event attribution analyses. Advances in Statistical Climatology, Meteorology and Oceanography, 6(2), 177–203. 10.5194/ascmo-6-177-2020
  2. Coles, S. (2001). An Introduction to Statistical Modeling of Extreme Values. In Springer Series in Statistics. Springer London. 10.1007/978-1-4471-3675-0
  3. Wingate, D., & Weber, T. (2013). Automated Variational Inference in Probabilistic Programming. arXiv. 10.48550/ARXIV.1301.1299
  4. Ranganath, R., Gerrish, S., & Blei, D. M. (2014). Black Box Variational Inference. arXiv. 10.48550/ARXIV.1401.0118