Skip to article frontmatterSkip to article content

Covariate Effects

CSIC
UCM
IGEO

TLDR: We will incorporate covariates to try and make the data more IID. We will look at covariates such as the spatial coordinates, the temporal coordinates and any related variables of interest. Some variables include temperature and precipitation. We can also explore local and global covariates like “warming”. We will explore simple statistical models like linear functions and then more complex functions like basis functions or perhaps nonlinear functions. We will use Bayesian Hierarchical Models (BHMs) to separate the parameter uncertainty of the process with the hyper-parameters of the process Parameterization. To select the best parameterization, we can use some traditional model selection techniques like AIC or BIC.

Conditional Models

A second attempt would be to construct a conditionally parameterized distribution. In this case, we introduce some dependencies on the variable itself.

up(θ(u))u \sim p\left(\theta (u)\right)

This changes this function from a generative parametric model to a likelihood model, i.e., a model that generates the parameters wrt some external parameter like time.

In general, most of the distributions that were mentioned above have a mean and standard deviation parameter. One could easily try to parameterize those parameters with functions that are dependent upon some space and time parameters. For example, the mean function could be depending upon space, time, and some control vector and likewise the standard deviation.

Data

For fitting this data, we assume that all elements within the time series are a sequential set of observations

D={t,ut}t=1Nt\mathcal{D} = \{ t, u_t\}_{t=1}^{N_t}

where uRu\in\mathbb{R}.

Conditional Parametric Model

We could also include some conditioning variables, u\boldsymbol{u}. The parameters, θ\boldsymbol{\theta}, will be functions of the conditioning variable and some hyperparameters, α\boldsymbol{\alpha}.

yup(yu,θ,α)\begin{aligned} \boldsymbol{y}|\boldsymbol{u} \sim p(\boldsymbol{y}|\boldsymbol{u},\boldsymbol{\theta},\boldsymbol{\alpha}) \end{aligned}

Some ideas for conditioning variables, u\boldsymbol{u}.

Standard Variables

  • u\boldsymbol{u} - standard variables, e.g., temperature, precipitation, ENSO, etc. θμ=exp(Wu+b)\boldsymbol{\theta_\mu} = \exp \left( \mathbf{W}^\top\mathbf{u} + \mathbf{b}\right)
  • tt - temporal coordinate encoding, θμ=asin(2πt/ωL)\boldsymbol{\theta_\mu}=a\sin(2\pi t/\omega L)
  • x\mathbf{x} - spatial coordinates encoding

Temporal Encoding

Here, we assume that the parameters of the data are processes which are parameterized functions of time.

θp(θt,α)\begin{aligned} \boldsymbol{\theta} \sim p(\boldsymbol{\theta}|t,\boldsymbol{\alpha}) \end{aligned}

where tt is the time and α\boldsymbol{\alpha} are the parameters of the distribution. For example, the Gaussian distribution could have a parameterized mean and scale. Similarly, the GEVD could have a parameterized mean, scale and shape. Again, this function can be as simple or as complicated as necessary, e.g., a linear model, a basis function, a non-linear function or a stochastic process. So, we have a conditional data likelihood term.

yp(yθ,t,α)\boldsymbol{y}\sim p(\boldsymbol{y}|\boldsymbol{\theta},t,\boldsymbol{\alpha})

The posterior is given by

p(θ,αy)=1Zp(yθ,t,α)p(θt,α)p(α)p(\boldsymbol{\theta},\boldsymbol{\alpha}|\boldsymbol{y}) = \frac{1}{Z} p(\boldsymbol{y}|\boldsymbol{\theta},t,\boldsymbol{\alpha}) p(\boldsymbol{\theta}|t,\boldsymbol{\alpha}) p(\boldsymbol{\alpha})

This is known as a Bayesian Hierarchical Model (BHM) because we have hierarchical processes and priors which condition the data likelihood.



Spatial Encoding

θp(θx,α)\begin{aligned} \boldsymbol{\theta} \sim p(\boldsymbol{\theta}|\mathbf{x},\boldsymbol{\alpha}) \end{aligned}

where tt is the time and α\boldsymbol{\alpha} are the parameters of the distribution. For example, the Gaussian distribution could have a parameterized mean and scale. Similarly, the GEVD could have a parameterized mean, scale and shape. Again, this function can be as simple or as complicated as necessary, e.g., a linear model, a basis function, a non-linear function or a stochastic process. So, we have a conditional data likelihood term.

yp(yθ,x,α)\boldsymbol{y}\sim p(\boldsymbol{y}|\boldsymbol{\theta},\mathbf{x},\boldsymbol{\alpha})

The posterior is given by

p(θ,αy)=1Zp(yθ,x,α)p(θx,α)p(α)p(\boldsymbol{\theta},\boldsymbol{\alpha}|\boldsymbol{y}) = \frac{1}{Z} p(\boldsymbol{y}|\boldsymbol{\theta},\mathbf{x},\boldsymbol{\alpha}) p(\boldsymbol{\theta}|\mathbf{x},\boldsymbol{\alpha}) p(\boldsymbol{\alpha})

This is also a BHM because we have hierarchical processes and priors which condition the data likelihood.

Inference

We can use the standard posterior distribution to best fit the parameters of the distribution given the data.

p(θD)=1Zp(Dθ)p(θ)p(\theta|\mathcal{D}) = \frac{1}{Z}p(\mathcal{D}|\theta)p(\theta)

Here, we have many examples of different inference methods to use. For example, we can use simple methods like Maximum Likelihood Estimation (MLE) or Maximum-A-Posterior (MAP). We can use approximate inference methods like Laplace or Variational Inference (VI). We could even use sampling methods like Markov Chain Monte-Carlo (MCMC).

Literature Review

PseudoCode


Model

Recall, we are using a Gaussian likelihood that describes our observations.

yN(yμ,σ) \boldsymbol{y} \sim \mathcal{N}(\boldsymbol{y}|\boldsymbol{\mu},\boldsymbol{\sigma})

However, we recognize that a single parameter, μ\boldsymbol{\mu}, is too simple describe the complete complexity of the system. Let’s assume we have our covariate of interest, x\boldsymbol{x}, which we think is related to our observations, y\boldsymbol{y}. We can write a linear function which relates a parameter of our data likelihood to this covariate.

μ(x;θ)=w1x1+w2x2++wDxxDx+b=Wx+b\begin{aligned} \boldsymbol{\mu}(\boldsymbol{x};\boldsymbol{\theta}) &= \mathbf{w}_1\boldsymbol{x}_1 + \mathbf{w}_2\boldsymbol{x}_2 + \ldots + \mathbf{w}_{D_x}\boldsymbol{x}_{D_x} + \mathbf{b} = \mathbf{W}^\top\boldsymbol{x} + b \end{aligned}

Where θ={W,b}\boldsymbol{\theta}=\{\mathbf{W},b\} are the learnable parameters. This mean parameter

Note: There are many things we can improve. For example, we can increase the model complexity, e.g., a basis function or a nonlinear model. We could also add more covariates.

def mean_function(
		x: Array["Dx"], weight: Array["Dy Dx"], bias: Array["Dy"]
	) -> Array["Dy"]:
    return weight @ x + bias

def model(x: Array["Dx"], y: Array["Dy"]=None):
    # prior parameters
    weight: Array["Dy Dx"] = sample("weight", dist.Normal(0.0, 1.0))
    bias: Array["Dy"] = sample("bias", dist.Normal(0.0, 1.0))
    # process parameterizations
    mu: Array["Dy"] = mean_function(x, weight, bias) 
    sigma: Array["Dy"] = sample("sigma", dist.Normal(0.0, 10.0))
    # data likelihood
    return sample("y", dist.Normal(mu, sigma), obs=y)

Prior Predictive Posterior

# initialize predictive utility
N = 1_000
predictive = Predictive(model, num_samples=N)
# make predictions
y_pred: Array["N"] = predictive(rng_key, X)["y"]

Inference

p(θy,x,α)p(yθ,x,α)p(\boldsymbol{\theta}|y,\boldsymbol{x},\boldsymbol{\alpha}) \sim p(y|\boldsymbol{\theta},\boldsymbol{x},\boldsymbol{\alpha})
kernel = NUTS(model)
num_samples = 2_000
mcmc = MCMC(kernel, num_warmup=1_000, num_samples=num_samples)
mcmc.run(key, x=x, y=y)
mcmc.print_summary()

Posterior Samples

Hyperparameters:αnp(αy,θ,x)=θp(yθ,x,α)p(θx,α)p(α)dθProcess Parameters:θnp(θy,α,x)=αp(yθ,x,α)p(θx,α)p(α)dα\begin{aligned} \text{Hyperparameters}: && && \boldsymbol{\alpha}_n &\sim p(\boldsymbol{\alpha}|y,\boldsymbol{\theta,x}) = \int_\theta p(y|\boldsymbol{\theta,x,\alpha}) p(\boldsymbol{\theta}|\boldsymbol{x,\alpha}) p(\boldsymbol{\alpha})d\boldsymbol{\theta} \\ \text{Process Parameters}: && && \boldsymbol{\theta}_n &\sim p(\boldsymbol{\theta}|y,\boldsymbol{\alpha,x}) = \int_\alpha p(y|\boldsymbol{\theta,x,\alpha}) p(\boldsymbol{\theta}|\boldsymbol{x,\alpha}) p(\boldsymbol{\alpha})d\boldsymbol{\alpha} \end{aligned}
# extract posterior samples
posterior_samples: Dict = mcmc.get_samples()
# extract individual parameter samples
alpha: Array["Np Dalpha"] = posterior_samples["alpha"]
theta: Array["Np Dtheta"] = posterior_samples["theta"]

Posterior Predictive


# initialize predictive utility
predictive = Predictive(model, posterior_samples=posterior_samples, return_sites=["y"])
# make predictions
X_pred: Array["Nx"] = ...
y_pred: Array["Np"] = predictive(rng_key, X_pred)["y"]
References
  1. Philip, S., Kew, S., van Oldenborgh, G. J., Otto, F., Vautard, R., van der Wiel, K., King, A., Lott, F., Arrighi, J., Singh, R., & van Aalst, M. (2020). A protocol for probabilistic extreme event attribution analyses. Advances in Statistical Climatology, Meteorology and Oceanography, 6(2), 177–203. 10.5194/ascmo-6-177-2020
  2. Sartini, L., Cassola, F., & Besio, G. (2015). Extreme waves seasonality analysis: An application in the <scp>M</scp>editerranean <scp>S</scp>ea. Journal of Geophysical Research: Oceans, 120(9), 6266–6288. 10.1002/2015jc011061
  3. Jonathan, P., & Ewans, K. (2013). Statistical modelling of extreme ocean environments for marine design: A review. Ocean Engineering, 62, 91–109. 10.1016/j.oceaneng.2013.01.004