Skip to article frontmatterSkip to article content

EVT Parameterizations

CSIC
UCM
IGEO

Temporal Dependencies

In this section, we outline some of the ways we can construct parametric functions which have parameters with temporal dependencies.

θ(t)=θ0+θ1ϕ(t)+ϵθ\theta(t) = \theta_0 + \theta_1\phi(t) + \epsilon_\theta

where ψ(t)\psi(t) is some function which encodes time, e.g., linear, log-linear, Fourier, etc.


Encoding Time

ϕ=ϕ(t)ϕ:R+RDt\phi = \phi(t) \hspace{10mm} \phi: \mathbb{R}^+\rightarrow\mathbb{R}^{D_t}

The most general form would be Fourier series

ϕ(t)=k=1Kaksin(2πktP)+bkcos(2πktP)\phi(t) = \sum_{k=1}^K a_k \sin\left(\frac{2\pi k t}{P}\right) + b_k \cos\left(\frac{2\pi k t}{P}\right)

where tt is time, PP is the base period of the feature, nn is the index of the series. We could go more difficult by adding an envelope

ϕ(t)=k=1KakF~n(t)sin(2πktP)+bkF~ncos(2πktP)\phi(t) = \sum_{k=1}^K a_k \tilde{F}_n(t)\sin\left(\frac{2\pi k t}{P}\right) + b_k \tilde{F}_n\cos\left(\frac{2\pi k t}{P}\right)

Constant

We can use maximum likelihood to estimate the unknown parameters of the parametric distributions.

L(θ)=n=1NTf(yn)n=1NTS(yn)L(\boldsymbol{\theta}) = \prod_{n=1}^{N_T} f(y_n) \prod_{n=1}^{N_T}S(y_n)

We can rewrite this as a

logL(θ)=n=1NTlogf(yn)n=1NTF(yn)\log L(\boldsymbol{\theta}) = \sum_{n=1}^{N_T} \log f(y_n) - \sum_{n=1}^{N_T}F(y_n)

where ff is the PDF of some density function and FF is the CDF of some density function.


Parametric

λ(t,s)=μ(t,s)+g(t,s)\lambda^*(t,\mathbf{s}) = \boldsymbol{\mu}(t,\mathbf{s}) + \boldsymbol{g}(t,\mathbf{s})

Typically, we can use some kernel function

f(tθ)=n=1Nswnk(t,θn)f(t|\boldsymbol{\theta}) = \sum_{n=1}^{N_s} \mathbf{w}_n \boldsymbol{k}(t,\boldsymbol{\theta}_n)

where k\boldsymbol{k} is a kernel function (density function) with parameters θn\boldsymbol{\theta}_n. The wn\mathbf{w}_n are weights corresponding to the densities k(t,θn)\boldsymbol{k}(t,\boldsymbol{\theta}_n) and n=1Nswn=1\sum_{n=1}^{N_s}\mathbf{w}_n=1.


Kernel

A Hawkes Process is a process that has a background rate and a self-exciting term. This term encourages more events to happen given past events.

g(t,s)=gt(t)gs(s)\boldsymbol{g}(t,\mathbf{s}) = g_t(t)\boldsymbol{g}_s(\mathbf{s})

For this paper, they used a separable, parametric kernel function, i.e., a Hawkes kernel In particular, they used a Gaussian kernel in space and exponential in time.

g(t,s)=αβexp(βt)12πΣexp(sΣ1s)\boldsymbol{g}(t,\mathbf{s}) = \alpha\beta\exp(-\beta t) \frac{1}{\sqrt{2\pi|\mathbf{\Sigma}|}} \exp(-\mathbf{s}^\top\mathbf{\Sigma}^{-1}\mathbf{s})

where α,β>0\alpha,\beta>0 and Σ\mathbf{\Sigma} as a PSD matrix.


Doubly Stochastic

This is also known as a Log-Gaussian Cox process.

logμ(t,s)=ft(t)+fs(s)\log \boldsymbol{\mu}(t,\mathbf{s}) = \boldsymbol{f}_t(t) + \boldsymbol{f}_s(\mathbf{s})

where ff is a realization from a Gaussian process

fGP(mθ,kθ)\boldsymbol{f} \sim \mathcal{GP}(\boldsymbol{m_\theta},\boldsymbol{k_\theta})

If we assume that our hazard function is a constant function

h(t)=λh(t) = \lambda

This implies that our survival function is

S(t)=exp(λt)S(t) = \exp(-\lambda t)

Approximating Integrals

Let’s say we are given some pixel locations

H={sn}n=1N\mathcal{H} = \left\{\mathbf{s}_n \right\}_{n=1}^N

Monte Carlo

f(t)n=1NΔtn[1Kk=1Kf(tn+Δtnzk)]zkU[0,1]\int f(t) \approx \sum_{n=1}^N \Delta t_n \left[ \frac{1}{K}\sum_{k=1}^K f(t_n + \Delta_{t_n}z_k) \right] \hspace{10mm} z_k \sim U[0,1]

ODE Time Steppers

We can write our an equation of motion (EoM) describing the state transitions.

tλ=fθ(λ,t)f:RDλ×R+RDλ\begin{aligned} \partial_t \lambda &= \boldsymbol{f_\theta}(\lambda, t) && && && \boldsymbol{f}: \mathbb{R}^{D_\lambda}\times\mathbb{R}^+ \rightarrow \mathbb{R}^{D_\lambda} \end{aligned}

Now, we can integrate this in time which is given by the fundamental theorem of calculus.

λ(t)=λ0+0tfθ(λ0,τ)dτ\lambda(t) = \lambda_0 + \int_0^t\boldsymbol{f_\theta}(\lambda_0, \tau)d\tau

In practice, we know that we have to do some numerical approximations, e.g., Euler (Taylor Expansion) or RK4 (Quadrature). This leaves us with a time stepper function

λt1:=ϕt(λ)=λt0+t0t1fθ(λt0,τ)dτ\lambda_{t_1} := \boldsymbol{\phi}_t(\boldsymbol{\lambda}) = \lambda_{t_0} + \int_{t_0}^{t_1}\boldsymbol{f_\theta}(\lambda_{t_0}, \tau)d\tau

which we can apply in an autoregressive fashion

λT=ϕtTϕtT1ϕt1ϕt0(λ0) \boldsymbol{\lambda}_T = \boldsymbol{\phi}_{t_T} \circ \boldsymbol{\phi}_{t_{T-1}} \circ \ldots \circ \boldsymbol{\phi}_{t_1} \circ \boldsymbol{\phi}_{t_0}(\boldsymbol{\lambda}_0)

This is more commonly represented where we add a vector of time points we wish to receive from the ODESOlver, t=[t0,t1,,T]\mathbf{t}=[t_0, t_1, \ldots, T] and pass this through the ODESolver

λ=ODESolver(f,λ0,t,θ)\boldsymbol{\lambda} = \text{ODESolver} \left(\boldsymbol{f},\lambda_0, \mathbf{t},\boldsymbol{\theta}\right)

where λ=[λ0,λ1,,λT]\boldsymbol{\lambda}=[\lambda_0, \lambda_1, \ldots, \lambda_T]. This is a nice abstraction which allows the user to choose any arbitrary solver like the Euler, Heun or Runge-Kutta.


Quadrature

Ωλ(s)dsn=1Nswnλ(sn)\int_{\Omega}\boldsymbol{\lambda}(\mathbf{s})d\mathbf{s} \approx \sum_{n=1}^{N_s}\mathbf{w}_n\boldsymbol{\lambda}(\mathbf{s}_n)

where wn>0w_n>0 and n=1Nw=Ω\sum_{n=1}^Nw=|\Omega| and sn=1,2,,Ns\mathbf{s}_n=1,2,\ldots,N_s are all of the points in the domain, Ω\mathcal{\Omega}. This yields an approximation to the log likelihood as

logp(Ω)n=1Nlogλ(sn)n=1Nswnλ(sn)\log p(\mathcal{\Omega}) \approx \sum_{n=1}^N\log \boldsymbol{\lambda}(\mathbf{s}_n) -\sum_{n=1}^{N_s}\mathbf{w}_n\boldsymbol{\lambda}(\mathbf{s}_n)

We have 3 sets of points:

  • NsN_s - the points within the discretized domain
  • N(sn)N(\mathbf{s}_n) - the points within the discretized domain where we have events.
  • Nˉs=N(sn)Ns\bar{N}_s = N(\mathbf{s}_n)-N_s - the points in the discretized domain where there are no events.

Berman & Turner (1992) used the names design points, data points, and dummy points, respectively.

Let’s create a mask vector which represents the case whether or not we observe an event within an element.

mn={1,N(sn)00,N(sn)=0\mathbf{m}_n = \begin{cases} 1, && && N(\mathbf{s}_n) \geq 0 \\ 0, && && N(\mathbf{s}_n) = 0 \end{cases}

Now, we can rewrite the log-likelihood term to be

logp(Ω)n=1Nswn(mnwnlogλ(sn)λ(sn))\log p(\mathcal{\Omega}) \approx \sum_{n=1}^{N_s} \mathbf{w}_n \left( \frac{\mathbf{m}_n}{\mathbf{w}_n} \log\boldsymbol{\lambda}(\mathbf{s}_n) - \boldsymbol{\lambda}(\mathbf{s}_n) \right)

Pixel Counts

We can divide the domain, Ω\mathcal{\Omega}, into small pixels of each area, ω. Then the integral over the domain, Ω\mathcal{\Omega}, is approximated by summing over all of the pixels.

Ωλ(s)dsn=1Nsωnλ(sn)\int_{\Omega}\lambda(\mathbf{s})d\mathbf{s} \approx \sum_{n=1}^{N_s}\omega_n\lambda(\mathbf{s}_n)

where sns_n is the center of the nnth pixel. In this case, we discard the exact locations of the data points and we mark each data point with the center pixel. To approximate the sum over data points, we can take the sum over pixels.

nNlogλ(sn)n=1N(sn)logλ(sn)\sum_n^N\log\lambda(\mathbf{s}_n) \approx \sum_{n=1}N(\mathbf{s}_n)\log \lambda(\mathbf{s}_n)

where N(sn)N(s_n) is the number of data points falling into the nnth pixel. So we can take these two quantities and put them together to get

logp(Ω)n=1N(N(sn)logλ(sn)ωnλ(sn))\log p(\mathcal{\Omega}) \approx \sum_{n=1}^N \left( N(\mathbf{s}_n)\log \lambda(\mathbf{s}_n) - \omega_n\lambda(\mathbf{s}_n) \right)

Masks

We can make this a bit neater by introducing a masking variable, m\boldsymbol{m}. This variable acts as an indicator variable

mn={1,y>y00,yy0m_n = \begin{cases} 1, && y > y_0 \\ 0, && y \leq y_0 \end{cases}

Now, we can use this indicator variable to mask the likelihood function to zero if the observed value at the temporal resolution is above or below the threshold, y0y_0.

logp(ynθ)=n=1N(A)mnlogfGEVD(yn;θ)TyearsSGEVD(y0;θ)+logn=1N(A)fGPD(yn;θ)\begin{aligned} \log p(y_n|\boldsymbol{\theta}) &= \sum_{n=1}^{N(A)}m_n\log \boldsymbol{f}_{GEVD}(y_n;\boldsymbol{\theta}) \\ &- T_{years}\mathbf{S}_{GEVD}(y_0;\boldsymbol{\theta}) \\ &+\log \sum_{n=1}^{N(A)}\boldsymbol{f}_{GPD}(y_n;\boldsymbol{\theta}) \end{aligned}

Demo Code

For the spatiotemporal case:

# initialize the event times
T: Array["T"] = ...
# initialize the spatial events points
S: Array["T S"] = ...
# initialize the marks at the events in space and time
Y: Array["T S Dy"] = ..
# initialize mask at marks
M: Array["T S Dy"] = ...
References
  1. Berman, M., & Turner, T. R. (1992). Approximating Point Process Likelihoods with GLIM. Applied Statistics, 41(1), 31. 10.2307/2347614