Skip to article frontmatterSkip to article content

Generalized Extreme Value Distribution

CSIC
UCM
IGEO

This is a location-scale family distribution.


Parameters

Location:μRScale:σR+Shape:κR\begin{aligned} \text{Location}: && && \boldsymbol{\mu} &\in \mathbb{R} \\ \text{Scale}: && && \boldsymbol{\sigma} &\in \mathbb{R}^+ \\ \text{Shape}: && && \boldsymbol{\kappa} &\in \mathbb{R} \\ \end{aligned}

Probability Density Function

This is denoted as the probability that our rv YY will be equivalent to some specific value

p(Y=y):=f(y;θ)p(Y=y) := f(y;\boldsymbol{\theta})

We can define the probability density function

f(y;θ)=1σt(y;θ)κ+1et(y;θ)\boldsymbol{f}(y;\boldsymbol{\theta}) = \frac{1}{\sigma}t\left(y;\boldsymbol{\theta}\right)^{\kappa+1}e^{-t\left(y;\boldsymbol{\theta}\right)}

where the function t(y;θ)t(y;\boldsymbol{\theta}) is defined as:

t(y;θ)={[1+κ(yμσ)]+1/κ,κ0exp(yμσ),κ=0\boldsymbol{t}(y;\boldsymbol{\theta}) = \begin{cases} \left[ 1 + \kappa \left( \frac{y-\mu}{\sigma} \right)\right]_+^{-1/\kappa}, && \kappa\neq 0 \\ \exp\left(-\frac{y-\mu}{\sigma}\right), && \kappa=0 \end{cases}

From

Some different distribution types for the GEVD - Source - Medium Article

Figure 1:Some different distribution types for the GEVD - Source - Medium Article


Cumulative Distribution Function

This is denoted as the probability that our rv YY will be less than or equal to some specific value yy.

p(Yy):=F(y;θ)p(Y\leq y) := F(y;\boldsymbol{\theta})

We can define the cumulative density function

F(y;θ)=exp[t(y;θ)]\boldsymbol{F}(y;\boldsymbol{\theta}) = \exp \left[ -\boldsymbol{t}(y;\boldsymbol{\theta}) \right]

where the function t(y;θ)t(y;\boldsymbol{\theta}) is defined in equation (4).


Survival Function

This is the probability that our value of interest yy is less than ...

p(Y>y):=S(y)p(Y>y) := \boldsymbol{S}(y)

We denote this as:

SGEVD(y;θ)=1F(y;θ)\boldsymbol{S}_{GEVD}(y;\boldsymbol{\theta}) = 1 - \boldsymbol{F}(y;\boldsymbol{\theta})

We can plug in the CDF function into this equation

S(y;θ)=1exp[t(y;θ)]\boldsymbol{S}(y;\boldsymbol{\theta}) = 1 - \exp \left[ -\boldsymbol{t}(y;\boldsymbol{\theta}) \right]

where the function t(y;θ)t(y;\boldsymbol{\theta}) is defined in equation (4).


Quantile Function

This is also known as the Point-Percentile-Function or the inverse CDF. This function maps an input threshold, y0y_0, to a value yy st the probability of YY being less than or equal to yy is ypy_p.

yp=F(y;θ)y_p = \boldsymbol{F}(y;\boldsymbol{\theta})

We can take the inverse of this function to see that it is the inverse CDF which we denote as the quantile function.

yp=F1(yp;θ):=Q(yp;θ)y_p = \boldsymbol{F}^{-1}(y_p;\boldsymbol{\theta}) := \boldsymbol{Q}(y_p;\boldsymbol{\theta})

where yp[0,1]y_p\in[0,1] is the data within the probability transform domain. These can be computed in closed form

Q(yp)={μ+σκ[(logyp)κ1]κ0μσlog(logyp)κ=0\boldsymbol{Q}(y_p) = \begin{cases} \mu + \frac{\sigma}{\kappa }\left[(- \log y_p)^{-\kappa} - 1 \right] && \kappa\neq 0 \\ \mu - \sigma\log(- \log y_p ) && \kappa=0 \end{cases}

Return Period

We can calculate the RP using equation (8). Practically, we set this to the survival function of the GEVD (equation (6)).

1/TR=1F(y;θ)1/T_R = 1 - \boldsymbol{F}(y;\boldsymbol{\theta})

To make things simpler, we can simply use the quantile function in equation (12) and set the probability to

yp=11/TRy_p = 1 - 1 / T_R

However, if we expand this out, we get

y={μ+σκ{[log(11/TR)]κ1}κ0μσlog[log(11/TR)]κ=0y = \begin{cases} \mu + \frac{\sigma}{\kappa}\left\{\left[\log\left(1-1/T_R\right)\right]^{\kappa}-1\right\} && \kappa\neq 0 \\ \mu - \sigma \log \left[ - \log \left(1 - 1/T_R \right) \right] && \kappa=0 \end{cases}

Average Recurrence Interval

We can calculate the ARI using equation (14). Practically, we set this to the survival function of the GEVD (equation (6)).

1exp(1/Tˉ)=1F(y;θ)1 - \exp\left(-1/\bar{T}\right) = 1 - \boldsymbol{F}(y;\boldsymbol{\theta})

To make things simpler, we can simply use the quantile function in equation (12) and set the probability to

yp=exp(1/Tˉ)y_p = \exp\left(-1/\bar{T}\right)

However, if we expand this out and simplify, we get

y={μ+σκ(Tˉκ1)κ0μ+σlogTˉκ=0y = \begin{cases} \mu + \frac{\sigma}{\kappa}\left( \bar{T}^{\kappa}-1\right) && \kappa\neq 0 \\ \mu + \sigma\log \bar{T} && \kappa=0 \end{cases}

Joint Distribution

We can write the likelihood that the observations, yy, follow the GEVD distribution. So, given some observations, D={yn}n=1N\mathcal{D}=\{y_n\}_{n=1}^{N}, which we believe follow the GEVD distribution, we can write the joint distribution decomposition as

p(y1:N;θ)=p(θ)n=1Np(ynθ)p(y_{1:N};\boldsymbol{\theta}) = p(\boldsymbol{\theta}) \prod_{n=1}^N p(y_n|\boldsymbol{\theta})

This implies that the global prior parameters come from some distribution

θp(θ)\boldsymbol{\theta} \sim p(\boldsymbol{\theta})

and that these parameters get passed through our data likelihood term

ynp(yθ)y_n \sim p(y|\boldsymbol{\theta})

Log Probability

Recall the PDF for our iid samples is

p(y1:Nθ)=n=1N1σt(yn;θ)κ+1et(yn;θ)p(y_{1:N}|\boldsymbol{\theta}) = \prod_{n=1}^N\frac{1}{\sigma}t\left(y_n;\boldsymbol{\theta}\right)^{\kappa+1}e^{-t\left(y_n;\boldsymbol{\theta}\right)}

where t(yn;θ)t(y_n;\boldsymbol{\theta}) is defined in equation (4). We can add the log term to get

logp(y1:Nθ)=n=1Nlogp(ynθ)\log p(\boldsymbol{y}_{1:N}|\boldsymbol{\theta}) = \sum_{n=1}^N \log p(y_n|\boldsymbol{\theta})

which we can expand as

n=1Nlogp(yn;θ)=Nlogσ(1+1/κ)n=1Nlogt(yn;θ)n=1Nt(yn;θ)\sum_{n=1}^N\log p(y_n;\boldsymbol{\theta}) = -N \log \sigma - (1+1/\kappa)\sum_{n=1}^N \log t\left(y_n;\boldsymbol{\theta}\right) - \sum_{n=1}^N t\left(y_n;\boldsymbol{\theta}\right)

which reduces to

logp(y1:Nθ)=Nlogσ(1+1/κ)n=1Nlog[1+κzn]+n=1N[1+κzn]+1/κ\log p(\boldsymbol{y}_{1:N}|\boldsymbol{\theta}) = - N \log \sigma - (1+1/\kappa)\sum_{n=1}^N \log \left[ 1 + \kappa z_n\right]_+ - \sum_{n=1}^N \left[ 1 + \kappa z_n\right]_+^{-1/\kappa}

Reparameterization

In this instance, we are assuming that there is a threshold parameter, y0y_0. We can write the reparameterization of this distribution as

μ=μy0+σy0κ(1λy0κ)σ=σy0λy0κκ0μ=μy0+σy0lnλy0σ=σy0λy0κκ=0\begin{aligned} \mu &= \mu_{y_0} + \frac{\sigma_{y_0}}{\kappa}\left(1 - \lambda_{y_0}^{-\kappa} \right) && && \sigma =\sigma_{y_0}\lambda_{y_0}^{-\kappa} && && \kappa\neq0 \\ \mu &= \mu_{y_0} + \sigma_{y_0}\ln\lambda_{y_0} && && \sigma =\sigma_{y_0}\lambda_{y_0}^{-\kappa} && && \kappa=0 \\ \end{aligned}

Rescaling

δh=hh\delta_h = \frac{h}{h^*}

where hh is in years and hh^* is in days. We can write all of the parameters with these rescaled ones

μ=μ+1κ[σ(1δhκ)]σ=σδhκκ=κ\begin{aligned} \mu^* &= \mu + \frac{1}{\kappa}\left[\sigma^*(1-\delta_h^{-\kappa}) \right] \\ \sigma^* &= \sigma\delta_h^{\kappa} \\ \kappa^* &= \kappa \end{aligned}

Literature Review

Theory. Leadbetter et al. (1983) is another very popular book.

Applications. García et al. (2023) investigated annual temperature extremes in Extremadura, Spain where they applied a Gaussian process to account for spatial dependencies for the GEVD. Räty et al. (2022) looked at sea level extremes in the Finnish coastal region where they applied a Bayesian hierarchical model to account for spatial dependencies for the GEVD.

Algorithms. Moins et al. (2023) look at a reparameterization framework to improve the convergence of the MCMC inference algorithms. Koh et al. (2021) investigate spatiotemporal extremes of daily large wildfires in the French Mediterranean where they employ a Bayesian Hierarchical model for a PP for the events and a GPD for the marks.

References
  1. Leadbetter, M. R., Lindgren, G., & Rootzén, H. (1983). Extremes and Related Properties of Random Sequences and Processes. In Springer Series in Statistics. Springer New York. 10.1007/978-1-4612-5449-2
  2. García, J. A., Acero, F. J., & Portero, J. (2023). A Bayesian hierarchical spatio-temporal model for extreme temperatures in Extremadura (Spain) simulated by a Regional Climate Model. Climate Dynamics, 61(3–4), 1489–1503. 10.1007/s00382-022-06638-x
  3. Räty, O., Laine, M., Leijala, U., Särkkä, J., & Johansson, M. M. (2022). Bayesian hierarchical modeling of sea level extremes in the Finnish coastal region. 10.5194/nhess-2021-410
  4. Moins, T., Arbel, J., Girard, S., & Dutfoy, A. (2023). Reparameterization of extreme value framework for improved Bayesian workflow. Computational Statistics & Data Analysis, 187, 107807. 10.1016/j.csda.2023.107807
  5. Koh, J., Pimont, F., Dupuy, J.-L., & Opitz, T. (2021). Spatiotemporal wildfire modeling through point processes with moderate and extreme marks. arXiv. 10.48550/ARXIV.2105.08004