Skip to article frontmatterSkip to article content

Experiment 1a

For this process, we will exclusively use the GEVD for the data likelihood.

Under these assumptions we will try a few different models for fitting the GEVD method. We will try the following:

  • Null Model (M0M_0) - a naive log-likelihood estimation of the parameters assuming IID.
  • M1M_1 - a 2D point process formulation using the GEVD method

Hypothesis. The log-likelihood loss function is very similar for both approaches. However, the PP formulation has an extra “regularization” term which may improve or worsen the parameter estimation.

Analysis. We do a number of analysis to assess the efficacy of the methods. Some standard methods include:

  • Q-Q Plot
  • AIC
  • BIC

Table 1:Table of methods

ExperimentModelName
1aM0M_0GEVD
1aM1M_1PP-GEVD
1bM0M_0GPD
1bM1M_1PP-GPD
1bM1M_1PP-GPD-GEVD

Null Model - Baseline GEVD

Summary Formulation. The null model, M0M_0, will demonstration a do a baseline where we assume the data are distributed IID

D={yn}n=1N\mathcal{D} = \left\{ y_n \right\}_{n=1}^N

We can write the joint distribution as

p(y1:N,θ)=p(θ)n=1Np(ynθ)p(y_{1:N},\boldsymbol{\theta}) = p(\boldsymbol{\theta})\prod_{n=1}^N p(y_n|\boldsymbol{\theta})

where p(θ)p(\boldsymbol{\theta}) are the prior parameters and p(ynθ)p(y_n|\boldsymbol{\theta}) is the data likelihood. In this case, we assume that the measurements follow a GEVD.

Data Likelihood:p(ynθ)=fGEVD(yn;θ)\begin{aligned} \text{Data Likelihood}: && && p(y_n|\boldsymbol{\theta}) &= \boldsymbol{f}_{\text{GEVD}}(y_n;\boldsymbol{\theta}) \end{aligned}

So, to find the best parameters, we can maximize the log-likelihood of the joint distribution which is given as

logp(y1:N,θ)=n=1NlogfGEVD(yn;θ)+logp(θ)\log p(y_{1:N},\theta) = \sum_{n=1}^N\log \boldsymbol{f}_{\text{GEVD}}(y_n;\boldsymbol{\theta}) + \log p(\boldsymbol{\theta})

We can plug in the equations for the GEVD to obtain the full expressions

LGEVD(θ):=Nlogσ(1+1/κ)n=1Nlog[1+κzn]+n=1N[1+κzn]+1/κ\boldsymbol{L}_\text{GEVD}(\boldsymbol{\theta}) := - N \log \sigma - (1+1/\kappa)\sum_{n=1}^N \log \left[ 1 + \kappa z_n\right]_+ - \sum_{n=1}^N \left[ 1 + \kappa z_n\right]_+^{-1/\kappa}

Results:


Model 1 - PP GEVD


Summary Formulation. M1M_1, will demonstration how we can apply the PP formulation to calculate the parameters of the GEVD. We have a 2D space

A={[0,T]×[y,)},tTR+yYR\begin{aligned} A &= \left\{[0,T]\times[y,\infty) \right\}, && && t\in\mathcal{T}\subseteq\mathbb{R}^+ && y\in\mathcal{Y}\subseteq\mathbb{R} \end{aligned}

The joint hazard function is factored into a temporal component and a marks component

λ(y,t)=λ(t)λ(y)=fGEVD(y;θ)\lambda(y,t) = \lambda(t)\lambda(y) = \boldsymbol{f}_{\text{GEVD}}(y;\boldsymbol{\theta})

Note, we assume the ground intensity term equal 1, λg(t)=1\lambda_g(t)=1, which corresponds to a standard Poisson process (SPP); a special case of the Homogeneous Poisson process (HPP). The cumulative hazard function can be written as

Λ(A)=0TyfGEVD(y;θ)dydτ=0TSGEVD(y;θ)dτ\Lambda(A) = \int_0^T\int_y^\infty \boldsymbol{f}_{\text{GEVD}}(y;\boldsymbol{\theta}) dyd\tau = \int_0^T\boldsymbol{S}_{\text{GEVD}}(y;\boldsymbol{\theta})d\tau

We used the relationship of the SF, the CDF, and the PDF. We approximate this integral using a simple Riemann sum

0TSGEVD(y0;θ)n=1NTSGEVD(y0;θ)=NTSGEVD(y0;θ)\int_0^T\boldsymbol{S}_\text{GEVD}(y_0;\boldsymbol{\theta}) \approx \sum_{n=1}^{N_T} \boldsymbol{S}_\text{GEVD}(y_0;\boldsymbol{\theta}) = N_T\boldsymbol{S}_\text{GEVD}(y_0;\boldsymbol{\theta})

where NTN_T corresponds to the number of years. Now, we can write the log-likelihood function as

logp(y1:N,θ)=n=1NTlogfGEVD(ynθ)+logp(θ)NTSGEVD(y0;θ)\log p(y_{1:N},\theta) = \sum_{n=1}^{N_T} \log \boldsymbol{f}_{\text{GEVD}}(y_n|\boldsymbol{\theta}) + \log p(\boldsymbol{\theta}) - N_T\boldsymbol{S}_\text{GEVD}(y_0;\boldsymbol{\theta})

The first term is the log-likelihood of the specific events at the specific location, (tn,yn)(t_n,y_n), that we observe them. The second term is the probability that we do not observe them anywhere else within the time interval of interest, (0,T](0,T]. We can plug in the formulas for the GEVD into this expression to obtain

LPP-GEVD(θ):=LGEVD(θ)NT[1+κzn]+1/κ\begin{aligned} \boldsymbol{L}_\text{PP-GEVD}(\boldsymbol{\theta}) &:= \boldsymbol{L}_\text{GEVD}(\boldsymbol{\theta}) - N_T \left[ 1 + \kappa z_n\right]_+^{-1/\kappa} \end{aligned}

where LGEVD\boldsymbol{L}_\text{GEVD} is the LL for the GEVD PDF (equation (15)).


GPD Connections

We can also estimate the parameters of the Poisson-GPD using some relationships between the GEVD and the GPD. The parameters of the GPD can be calculated by

λ=[1+κz0]+1/κσ=σ+κ(y0μ)κ=κ \begin{aligned} \lambda = [1 + \kappa z_0]_+^{-1/\kappa} && && \sigma^* &= \sigma + \kappa(y_0 - \mu) && && \kappa^* = \kappa \end{aligned}

where z0=(y0μ)/σz_0 = (y_0 - \mu)/\sigma. There are other formulations which exist like:

μ=μ+σκ(1λκ)σ=σλκκ=κ \begin{aligned} \mu^* = \mu + \frac{\sigma}{\kappa}(1 - \lambda^\kappa) && && \sigma^* = \sigma\lambda^{\kappa} && && \kappa^* = \kappa \end{aligned}

Note, in the experiment for the GEVD, we have 1 event per year, i.e., λ=1\lambda=1. So we can simplify this formulation to be:

μ=μσ=σκ=κ \begin{aligned} \mu^* = \mu && && \sigma^* = \sigma && && \kappa^* = \kappa \end{aligned}

Return Period / Average Recurrence Interval

We will also calculate the average recurrence interval (ARI) and the return period (RP) to calculate the 100-year event.

y=QGEVD(yp;θ)yp=11/Tayp=exp(1/Tp)\begin{aligned} y = \boldsymbol{Q}_\text{GEVD}(y_p;\boldsymbol{\theta}) && && y_p = 1 - 1/T_a && && y_p = \exp(- 1 / T_p) \end{aligned}

where QGEVD\boldsymbol{Q}_\text{GEVD} is the quantile distribution function and TaT_a / TpT_p is the year.


Model 2

For this process, we will exclusively use the GPD model for the data likelihood.

Under these assumptions, we will try fitting different methods for the GPD method. We will try the following:

  • M0M_0 - a naive parameter estimation assuming IID. This will server as the null model for this experiment.
  • M1M_1 - a marked point process formulation
  • M2M_2 - a decoupled marked point process formulation

Model 2a - GPD

The null model, M2aM_{2_a}, will demonstration a do a baseline where we assume the data are distributed IID

D={yn}n=1N\mathcal{D} = \left\{ y_n \right\}_{n=1}^N

We can write the joint distribution as

p(y1:N,θ)=p(θ)n=1Np(ynθ)p(y_{1:N},\boldsymbol{\theta}) = p(\boldsymbol{\theta})\prod_{n=1}^N p(y_n|\boldsymbol{\theta})

where p(θ)p(\boldsymbol{\theta}) are the prior parameters and p(ynθ)p(y_n|\boldsymbol{\theta}) is the data likelihood. In this case, we assume that the measurements follow a GPD.

Data Likelihood:p(ynθ)=fGPD(yn;θ)\begin{aligned} \text{Data Likelihood}: && && p(y_n|\boldsymbol{\theta}) &= \boldsymbol{f}_{\text{GPD}}(y_n;\boldsymbol{\theta}) \end{aligned}

So, to find the best parameters, we can maximize the log-likelihood of the joint distribution which is given as

logp(y1:Nθ)=n=1NlogfGPD(yn;θ)\log p(y_{1:N}|\theta) = \sum_{n=1}^N\log \boldsymbol{f}_{\text{GPD}}(y_n;\boldsymbol{\theta})

We can plug in the equations for the GEVD to obtain the full expressions

LGPD(θ):=Nlogσ(1+1/κ)n=1Nlog[1+κzn]+\boldsymbol{L}_\text{GPD}(\boldsymbol{\theta}) := - N \log \sigma - (1+1/\kappa)\sum_{n=1}^N \log \left[ 1 + \kappa z_n\right]_+

Model 2b - PP-GPD

Summary Formulation. M2bM_{2_b} will demonstration how we can apply the PP formulation to calculate the parameters of the GEVD. We have a 2D space as shown in equation (6). The joint hazard function is factored into a temporal component and a marks component

λ(y,t)=λ(t)λ(y)=fGPD(y;θ)\lambda(y,t) = \lambda(t)\lambda(y) = \boldsymbol{f}_{\text{GPD}}(y;\boldsymbol{\theta})

Note, we assume the ground intensity term equal 1, λg(t)=1\lambda_g(t)=1, which corresponds to a standard Poisson process (SPP); a special case of the Homogeneous Poisson process (HPP). The cumulative hazard function can be written as

Λ(A)=0TyfGPD(y;θ)dydτ=0TSGPD(y;θ)dτ\Lambda(A) = \int_0^T\int_y^\infty \boldsymbol{f}_{\text{GPD}}(y;\boldsymbol{\theta}) dyd\tau = \int_0^T\boldsymbol{S}_{\text{GPD}}(y;\boldsymbol{\theta})d\tau

We used the relationship of the SF, the CDF, and the PDF. We approximate this integral using a simple Riemann sum

0TSGPD(y0;θ)n=1NTSGPD(y0;θ)=NTSGPD(y0;θ)\int_0^T\boldsymbol{S}_\text{GPD}(y_0;\boldsymbol{\theta}) \approx \sum_{n=1}^{N_T} \boldsymbol{S}_\text{GPD}(y_0;\boldsymbol{\theta}) = N_T\boldsymbol{S}_\text{GPD}(y_0;\boldsymbol{\theta})

where NTN_T corresponds to the number of events. Now, we can write the log-likelihood function as

logp(y1:N,θ)=n=1NTlogfGPD(yn;θ)+logp(θ)NTSGPD(y0;θ)\log p(y_{1:N},\theta) = \sum_{n=1}^{N_T}\log \boldsymbol{f}_{\text{GPD}}(y_n;\boldsymbol{\theta}) + \log p(\boldsymbol{\theta}) - N_T\boldsymbol{S}_\text{GPD}(y_0;\boldsymbol{\theta})

The first term is the log-likelihood of the specific events at the specific location, (tn,yn)(t_n,y_n), that we observe them. The second term is the probability that we do not observe them anywhere else within the time interval of interest, (0,T](0,T]. We can plug in the formulas for the GEVD into this expression to obtain

LPP-GPD(θ):=LGPD(θ)NT[1+κz0]1κ\begin{aligned} \boldsymbol{L}_\text{PP-GPD}(\boldsymbol{\theta}) &:= \boldsymbol{L}_\text{GPD}(\boldsymbol{\theta}) - N_T \left[ 1 + \kappa z_0\right]^{-\frac{1}{\kappa}} \end{aligned}

where z0=(y0μ)/σz_0 = (y_0 - \mu)/\sigma and LGPD\boldsymbol{L}_\text{GPD} is the LL for the GPD PDF (equation (21)).


Model 2c - PP-GPD-GEVD

M2cM_{2_c} will be the same as the above one except we will investigate how we can use the relationships between the GEVD and the GPD. The parameters of the GPD can be calculated by

λ=[1+κz0]+1/κσ=σ+κ(y0μ)κ=κ \begin{aligned} \lambda = [1 + \kappa z_0]_+^{-1/\kappa} && && \sigma^* &= \sigma + \kappa(y_0 - \mu) && && \kappa^* = \kappa \end{aligned}

where z0=(y0μ)/σz_0 = (y_0 - \mu)/\sigma. There are other formulations which exist like:

μ=μ+σκ(1λκ)σ=σλκκ=κ \begin{aligned} \mu^* = \mu + \frac{\sigma}{\kappa}(1 - \lambda^\kappa) && && \sigma^* = \sigma\lambda^{\kappa} && && \kappa^* = \kappa \end{aligned}

Note, in the experiment for the GEVD, we have 1 event per year, i.e., λ=1\lambda=1. So we can simplify this formulation to be:

μ=μσ=σκ=κ \begin{aligned} \mu^* = \mu && && \sigma^* = \sigma && && \kappa^* = \kappa \end{aligned}

Model 3

For this process, we will exclusively use a combination of the GPD model for the data likelihood.

Under these assumptions, we will try fitting different methods for the GPD method. We will try the following:

  • M0M_0 - a naive parameter estimation assuming IID. This will server as the null model for this experiment.
  • M1M_1 - a marked point process formulation
  • M2M_2 - a decoupled marked point process formulation

Model 3a

We have the hazard function

λ(t,y)=λg(t)p(yt)=λfGPD(y;θ)\lambda (t,y) = \lambda_g(t)p(y|t) = \lambda \hspace{1mm}\boldsymbol{f}_\text{GPD}(y;\boldsymbol{\theta})

So, we can calculate the cumulative Hazard function just on the ground intensity

Λ(T)=0Tλdτ=λT\Lambda(\mathcal{T}) = \int_0^T\lambda d\tau = \lambda T

Now, we can plug this into our log-likelihood function.

logp(A)=NlogλλT+n=1NTlogfGPD(yn;θ)\log p(A) = N\log\lambda - \lambda T + \sum_{n=1}^{N_T} \log \boldsymbol{f}_\text{GPD}(y_n;\boldsymbol{\theta})

Model 3b

We will use some reparameterization of the lambda term, λ, to be in terms of the

λ=[1+κz0]1/κz0=(y0μ)/σσ=σ+κ(y0μ)κ=κ\begin{aligned} \lambda &= [1 + \kappa z_0]^{-1/\kappa} && && z_0 = (y_0 - \mu) / \sigma \\ \sigma^* &= \sigma + \kappa (y_0 - \mu) \\ \kappa^* &= \kappa \end{aligned}

We can expand this likelihood function fully to be

logp(A)=Nκlog[1+κz0]NTy[1+κz0]1/κ+n=1NTlogfGPD(yn;θ)\log p(A) = -\frac{N}{\kappa}\log[1 + \kappa z_0] - N_{T_y} [1 + \kappa z_0]^{-1/\kappa} + \sum_{n=1}^{N_T} \log \boldsymbol{f}_\text{GPD}(y_n;\boldsymbol{\theta})

where NTyN_{T_y} is the number of years.


Model 3c

λ(t,y)=λg(t)λg(y)p(yt)\lambda (t,y) = \lambda_g(t)\lambda_g(y)p(y|t)

So

Ground Intensity (Time):λg(t)=1Ground Intensity (Marks):λg(y)=fGEVD(y0;θ)Conditional Marks:p(yt)=fGPD(y;θ)\begin{aligned} \text{Ground Intensity (Time)}: && && \lambda_g(t) &= 1 \\ \text{Ground Intensity (Marks)}: && && \lambda_g(y) &= \boldsymbol{f}_\text{GEVD}(y_0;\boldsymbol{\theta}) \\ \text{Conditional Marks}: && && p(y|t) &= \boldsymbol{f}_\text{GPD}(y;\boldsymbol{\theta}) \end{aligned}

We have already seen the intensity terms in the above section. This is equivalent to Model M1bM_{1_b}.

LPP-GPD-GEVD(θ):=LPP-GPD-GEVD(θ)+LGPD(θ)=n=1NTlogfGEVD(ynθ)NTSGEVD(y0;θ)+n=1NTlogfGPD(yn;θ)\begin{aligned} \boldsymbol{L}_\text{PP-GPD-GEVD}(\boldsymbol{\theta}) &:= \boldsymbol{L}_\text{PP-GPD-GEVD}(\boldsymbol{\theta}) + \boldsymbol{L}_\text{GPD}(\boldsymbol{\theta}) \\ &= \sum_{n=1}^{N_T} \log \boldsymbol{f}_{\text{GEVD}}(y_n|\boldsymbol{\theta}) - N_T\boldsymbol{S}_\text{GEVD}(y_0;\boldsymbol{\theta}) + \sum_{n=1}^{N_T} \log \boldsymbol{f}_\text{GPD}(y_n;\boldsymbol{\theta}) \end{aligned}

where LPP-GEVD(θ)\boldsymbol{L}_\text{PP-GEVD}(\boldsymbol{\theta}) is defined in equation (11) and LGPD(θ)\boldsymbol{L}_\text{GPD}(\boldsymbol{\theta}) in equation (21).