Modeling Extreme Values

Blog Schedule¶

Part I: Global Mean Surface Temperature Anomaly

Part II: Single Weather Station Means

Part III: Multiple Weather Station Means

Part IV: Weather Station Extremes

Part V: Multiple Weather Station Extremes

Part I: The Big Why¶

Extremes are Difficult - The Tails of the distribution

Part II: GMSTA¶

This first part will provide a nice base case to step through each of the individual modeling decisions we have to do once we

1. Data Download + EDA¶

In this tutorial, we want to showcase some of the immediate data properties that we can see just from plotting the data and calculate some statistics. A common theme we would like to showcase is that there are different ways to measure samples: trend, spread, and shape. Furthermore, we can get different things depending upon the discretization.

Data Sources - ClimateDataStore, NOAA-NCEI
EDA - Statistics, Stationarity, Noise
Viz - Scatter, Histogram
Temporal Binning - All, Decade, Year, Season, Month
Library - matplotlib, numpy

2. Data Pipelines¶

In this tutorial, we want to showcase some of the immediate data properties that we can see just from plotting the data and calculate some statistics.

Data Sources - ClimateDataStore, NOAA-NCEI
API Keys
Packages - Typer, Hydra, DVC

3. Recreating the Anomalies¶

We do some manual feature extraction where we try to recreate these anomalies from the original data. This involves decomposing the signal, spatial averaging and temporal smoothing. In essences, this is a quick tutorial about how we can gather anomalies in a classical way and what we will not be doing in the following tutorials.

Manual Feature Extraction
Periods Signal - Climatology, Reference Periods
Spatial Averaging - Weighted Means
Temperal Smoothing - Filtering
Library - xarray

4. Signal Decomposition¶

We do some classical signal decomposition assuming 3 underlying components: trends, cycles, and residuals. We also showcase some different ways to express a model using these 3 components: additive, multiplicative, and non-linear. This will serve as a precursor to the parameter estimation task where we need to describe an underlying parameterization of the signal.

Components - Trend, Cycle, Residuals
Combination - Additive, Multiplicative, Non-Linear
Precursor to parameter estimation
Library - statsmodels

5. Unconditional Model¶

We will apply the simplest parameterization: assuming IID. This will establish a baseline and we will start to get comfortable with the Bayesian language for modeling.

Baseline Model - Fully Pooled Constrained
Data Likelihood - Normal
Prior - Uninformative
Inference - MAP + MCMC
Library - Numpyro

6. Metrics¶

Fit - PP-Plot, QQ-Plot
Parameters - Joint Plot
Tails - Return Period vs Empirical
Sensitivity Analysis - Gradient-Based, Sampling-Based, Proxy-Based
Summary Stats - NLL, AIC, BIC
Libraries - pyviz, xarray

7. Bayesian Hierarchical Model¶

We will explore some improved parameterization strategies when thinking about.

Parameterizations
- Baseline Model - Fully Pooled Constrained
- Non-Pooled - Unconstrained
- Partially Pooled - Bayesian Hierarchical Model
Data Likelihood - Normal
Inference - MAP + MCMC
Library - Numpyro

8. Likelihoods Whirlwind Tour¶

The likelihood is an important piece of the Bayesian modeling framework. We show which likelihoods make sense for which datasets depending upon the structure of the data. We showcase the standard Gaussian but we also explore more heavy-tailed distributions like Log-Normal and T-Student.

Simple Likelihoods
- Gaussian
- Log-Normal
- T-Student
- GEVD
Inference - MAP + MCMC
Library - Numpyro

9. Inference Whirlwind Tour¶

This will give a whirlwind tour of some basic inference schemes: how we will learn the parameters of our model. We will give the barebones scheme where there is no uncertainty, i.e., uniform priors --> MLE. We will also demonstrate how to find the parameters using the sampling scheme, MCMC. Lastly, we will give an overview of a method to approximate the posterior distribution, i.e., VI.

Inference Schemes
- Sample-Based - MCMC
- Non-Bayesian - MLE
- Approximate Bayesian - VI
Library - Numpyro

10. Temporally Conditioned Model¶

We will introduce the notion of conditioning our data on the time stamp. This is a natural introduction to how to properly model time series data.

Time Coordinate Encoder - Year, Season, Month
Priors - Normal, Uniform, Laplace, Delta
- Bias - Intercept, i.e., $t=0$ --> Normal + Mean @ $t=0$
- Weight - Slope/Tendency --> Normal
- Noise - Normal, Cauchy
Inference - MAP + MCMC
Metrics
- Parameters - Joint Plots, Scatter
- Tails Analysis - Return Periods + Empirical
- Differences between T1 and T0 - Return Periods (1D, 2D), Line Plots,
- Sensitivity Analysis - Gradient-Based e.g., $\partial_t f(t,\theta)=w$
Predictions
- Hindcasting
- Forecasting
Extreme Values - Block Maximum v.s. Peak-Over-Threshold
Parameterization - Temporal Point Process
Relationship with common dists, GEVD & GPD
Custom Likelihood in Numpyro - GEVD, GPD

11. Dynamical Model¶

In this module, we will introduce dynamical model formulism as an alternative parameterization. We do not explicitly condition on the time step itself Instead, we condition on the state at a previous time step as well as observations.

Dynamical Model Formalization
- Initial Condition
- Equation of Motion
- TimeStepper —> ODESolver
- Observation Operator
Equation of Motion Parameterizations
- Closed Form - Constant, Linear, Exponential, Logistic —> Closed-Form Solution
- Structured - Linear, Reduced Order, Exponential,
- Free-Form - Neural Network
TimeStepper - Quadrature (Runge-Kutta)
Observation Operator - Linear
Inference - MAP + MCMC
Predictions - Hindcasting + Forecasting

12. State Space Model¶

State Space Formalization
- Initial Distribution
- Transition Distribution
- Emission Distribution
- Posterior - Filtering, Smoothing
Connections (Generalization)
- ODE —> Strong-Constrained vs Weak-Constrained
- Time Conditioned —> Full-Form vs Gradient-Form
Inference - MAP + MCMC
Predictions - Hindcasting + Forecasting

13. Structured State Space Model¶

Structured
- Time Dependence - Cycle, Season
- Trend, Locally Linear
- Temporal History Dependence - Autoregressive
Inference - MAP + MCMC

14. Whirlwind Tour¶

Linear
Basis Function
Neural Network
Gaussian Processes

15. Ensembles¶

Multiple GMSTA Perspectives
X-Casting
Strong-Constrained Dynamic Model, aka, NeuralODE
Weak-Constrained Dynamical Model, aka, SSM

Part III: Single Weather Station¶

In this module, we start to look at single weather stations for Spain.

16. EDA Revisited¶

In this tutorial, we want to showcase some of the immediate data properties that we can see just from plotting the data and calculate some statistics. A common theme we would like to showcase is that there are different ways to measure samples: trend, spread, and shape. Furthermore, we can get different things depending upon the discretization.

Data Sources - AEMET-OpenData, python-aemet
Datasets -
EDA - Statistics, Stationarity, Noise
Viz - Scatter, Histogram
Temporal Binning - All, Decade, Year, Season, Month
Library - matplotlib, numpy

17. Baseline Model¶

Data Download + EDA - Histograms, Stationarity, Noise
Datasets
- Temperature
- Precipitation
- Wind Speed
Data Likelihoods
- Standard - Gaussian, Generalized Gaussian
- Long-Tailed - T-Student, LogNormal
Parameterizations - Fully Pooled, Non-Pooled, Partially Pooled

18. State Space Model¶

Predictions - Hindcasting, Forecasting

Part IV: Multiple Weather Stations¶

Introduction

EDA - Multiple Weather Stations, AutoCorrelation, Variogram

Unconditional Models - Spatiotemporal Series

Baseline Model - State Space Model w/ Spatial Dims
Spatial Models - EDA + Weight Matrix
Spatial State Space Model
Scale - Variational Posterior

Conditional Models - Spatiotemporal Series

EDA - Multiple Weather Stations + Covariates
Baseline Model - IID —> Bayesian Hierarchical Model
Conditional SSMs
Reparameterization

Other

EDA - Exploring Spatial Dependencies (Altitude, Longitude, Latitude)
Spatial Autocorrelation with (Semi-)Variograms
Discretization - Histogram
Dynamical Model
Spatial Operator - Finite Difference, Convolutions

19. EDA¶

Histograms - Grouped (Time)
Scatter Plots - Binned
Multiple Temporal AutoCorrelation Plots
Spatial Autocorrelation, Variogram
Clustering - GMMs (Grouped)

20. Baseline Model¶

Batch Processing

\begin{aligned} p(\mathbf{Y},\mathbf{z}, \boldsymbol{\theta}) &= \prod_{m=1}^{N_\Omega} p(\boldsymbol{\theta}_m) \prod_{n=1}^{N_T} p(\mathbf{y}_{nm}|\mathbf{z}_{nm}) p(\boldsymbol{z}_{nm}|\boldsymbol{\theta}_m) \end{aligned}

(7)

21. Regressor - Weight Matrix¶

EDA - Spatial Correlation, Variogram
Domain Shape
- Unstructured, Irregular - Graph —> Adjacency Matrix
- Regular - Convolution —> Kernel

22. GP Regressor¶

\begin{aligned} p(\mathbf{Y},\mathbf{z}, \boldsymbol{\theta}) &= p(\boldsymbol{\theta}) p(\boldsymbol{\alpha}) p(\boldsymbol{f}|\boldsymbol{\alpha}) \prod_{n=1}^{N_T} p(\mathbf{y}_{n}|\mathbf{z}_{n}) p(\mathbf{z}_{n}|\boldsymbol{f},\boldsymbol{\theta}) \end{aligned}

(8)

23. SSM - Spatial Model¶

\begin{aligned} p(\mathbf{Y},\mathbf{z}, \boldsymbol{\theta}) &= p(\boldsymbol{\theta}) p(\mathbf{z}_0|\boldsymbol{\theta}) \prod_{t=1}^{T} p(\mathbf{y}_{t}|\mathbf{z}_{t}) p(\mathbf{z}_{t}| \mathbf{z}_{t-1}\boldsymbol{\theta}) \end{aligned}

(9)

Spatial Operator Parameterizations
- Fully Connected
- Convolutions

24. Scale - VI¶

Whirlwind Tour
- Filter-Update Posterior, $q(z_t|z_{t-1}, y_t)$
- Smoothing Posterior, $q(z_t|z_{t-1}, y_{1:T})$

Part V: Spain Weather Stations (Extremes)¶

TOC¶

What is an Extreme Event?
Classic Method I - Block Maximum
Classic Method II - Peak Over Threshold
Classic Method III - Point Process
Revised Method - Marked Temporal Point Process

25. What is an Extreme Event?¶

What is an event?
Objective - Forecasting, Return Period
Definitions
- Mean vs Tails
- Maximum/Minimum, Thresholds
- Power Law
Problems
- Tails - Few/No Observations
- Independence - even with observations, not independent
- Models - Few Obs + Dependence —> Difficult to Fit a model
Whirlwind Tour - BM, POT, TPP
Example
- Gaussian, Generalized Gaussian, T-Student, GEVD, GPD
- Sample Data Likelihood - x100, x1000, x10000

26. Block Maximum¶

What is an event? - The maximum over within a block of time.
Temporal Resolution - Year, Season, Month, Day
Viz - Histogram, Scatter Plot, Violin Plot, Ridge Plot
EDA - seaborn simple linear regressors, i.e., trends

27. Peak Over Threshold¶

What is an event? - An Event Over a Threshold
Threshold Selection - Quantiles (90, 95, 98, 99)
Temporal Resolution (Declustering) - Year, Season, Month, Day
Viz - Histogram, Scatter Plot, Violin Plot, Ridge Plot
EDA - seaborn simple linear regressors, i.e., trends

28. Temporal Point Process¶

What is an event? - The Events Over a Threshold within a block of time.
Block Maximum Temporal Resolution - Year, Season, Month, Week, Days
Threshold Selection - Quantiles
Theory - Point Process for Extremes
Viz - Histogram, Scatter Plot, Violin Plot, Ridge Plot
EDA - seaborn simple linear regressors, i.e., trends
Data Likelihood - Point Process
Baseline Model - Pooled, Non-Pooled, Partially Pooled

Engineering I - Block Maximum¶

Data - Download from DVC
Geoprocessing - Select Station, Clean Labels
ML Pre-Processing - Standardization, Train/Valid/Test Split
ML Training - Model Load, Model Train, Model Save
MLOps - EDA, Metrics, Hindcasting, Forecasting

29. Parameterization MTPP¶

What is an event? - The Events Over a Threshold within a block of time.
What is a mark? - The intensity of an event if it happens.
Block Maximum Temporal Resolution - Year, Season, Month, Week, Days
Threshold Selection - Quantiles
Theory - Marked Decoupled Point Process
Data Likelihood - Point Process + Marks Distribution
Baseline Model - Pooled, Non-Pooled, Partially Pooled

30. Baseline Model - Univariate¶

Data Likelihood - GEVD: Limiting Distribution for Extremes
Baseline Model - Fully Pooled - Constrained
- Non-Pooled - Unconstrained
- Partially Pooled - Bayesian Hierarchical Model
Constraints - Tails
- Frechet - e.g., Temperature
- Weibull - e.g., Precipitation
Inference - MAP + MCMC
Metrics
- Fit - PP-Plot, QQ-Plot
- Parameters - Joint Plot
- Tails - Return Period + Empirical
- Sensitivity Analysis - Gradient-Based, Sampling-Based, Proxy-Based
Pipeline - Data, Model, Inference, Metrics

State Space Model - Baseline¶

State Space Formalization
- Initial Distribution
- Transition Distribution
- Emission Distribution
Connections (Generalization)
- ODE —> Strong-Constrained vs Weak-Constrained
- Time Conditioned —> Full-Form vs Gradient-Form
Inference
- Filter-Update, Smoothing
- MAP + MCMC
Predictions - Hindcasting + Forecasting

State Space Models - Structured¶

Structured
- Time Dependence - Cycle, Season
- Trend, Locally Linear
- Temporal History Dependence - Autoregressive
Inference - MAP + MCMC

Part VI: Spain Weather Stations (Extremes)¶