TLDR: We will incorporate covariates to try and make the data more IID. We will look at covariates such as the spatial coordinates, the temporal coordinates and any related variables of interest. Some variables include temperature and precipitation. We can also explore local and global covariates like “warming”. We will explore simple statistical models like linear functions and then more complex functions like basis functions or perhaps nonlinear functions. We will use Bayesian Hierarchical Models (BHMs) to separate the parameter uncertainty of the process with the hyper-parameters of the process Parameterization. To select the best parameterization, we can use some traditional model selection techniques like AIC or BIC.
Conditional Models¶
A second attempt would be to construct a conditionally parameterized distribution. In this case, we introduce some dependencies on the variable itself.
This changes this function from a generative parametric model to a likelihood model, i.e., a model that generates the parameters wrt some external parameter like time.
In general, most of the distributions that were mentioned above have a mean and standard deviation parameter. One could easily try to parameterize those parameters with functions that are dependent upon some space and time parameters. For example, the mean function could be depending upon space, time, and some control vector and likewise the standard deviation.
Data¶
For fitting this data, we assume that all elements within the time series are a sequential set of observations
where .
Case Study: Temperature Anomaly Tendency & Scale
In the paper of Phillip et al, 2020, they parameterize the mean and scale function using a linear function of the temperature. They assume that the mean is some function They remove the dependency that the temperature is independent and they try to fit a conditional parametric distribution to explain the anomalies on the temperature values which have a dependency on time. So the variable of interest is temperature and we assume it is iid
They assume a conditional likelihood function for temperature and time.
They use a GEV distribution for the likelihood function
The authors try to fit a linear function wrt time and a constant scale parameter. This is given by the equation
where the parameters of the mean function are and their are no parameters for the scale function. They showcase another example whereby they try to fit a function for the scale
where the parameters of the mean function are and the parameters of the scale function are . The origin of this function comes from
Conditional Parametric Model¶
We could also include some conditioning variables, . The parameters, , will be functions of the conditioning variable and some hyperparameters, .
Some ideas for conditioning variables, .
Standard Variables
- - standard variables, e.g., temperature, precipitation, ENSO, etc.
- - temporal coordinate encoding,
- - spatial coordinates encoding
Temporal Encoding¶
Here, we assume that the parameters of the data are processes which are parameterized functions of time.
where is the time and are the parameters of the distribution. For example, the Gaussian distribution could have a parameterized mean and scale. Similarly, the GEVD could have a parameterized mean, scale and shape. Again, this function can be as simple or as complicated as necessary, e.g., a linear model, a basis function, a non-linear function or a stochastic process. So, we have a conditional data likelihood term.
The posterior is given by
This is known as a Bayesian Hierarchical Model (BHM) because we have hierarchical processes and priors which condition the data likelihood.
Spatial Encoding¶
where is the time and are the parameters of the distribution. For example, the Gaussian distribution could have a parameterized mean and scale. Similarly, the GEVD could have a parameterized mean, scale and shape. Again, this function can be as simple or as complicated as necessary, e.g., a linear model, a basis function, a non-linear function or a stochastic process. So, we have a conditional data likelihood term.
The posterior is given by
This is also a BHM because we have hierarchical processes and priors which condition the data likelihood.
Inference¶
We can use the standard posterior distribution to best fit the parameters of the distribution given the data.
Here, we have many examples of different inference methods to use. For example, we can use simple methods like Maximum Likelihood Estimation (MLE) or Maximum-A-Posterior (MAP). We can use approximate inference methods like Laplace or Variational Inference (VI). We could even use sampling methods like Markov Chain Monte-Carlo (MCMC).
Literature Review¶
Applications
Temperature - Phillip et al, 2020
Wave Height - Wang et al, 2004 | Sartini et al, 2015
Ocean Environments - Jonathan & Ewans, 2013
PseudoCode¶
Model¶
Recall, we are using a Gaussian likelihood that describes our observations.
However, we recognize that a single parameter, , is too simple describe the complete complexity of the system. Let’s assume we have our covariate of interest, , which we think is related to our observations, . We can write a linear function which relates a parameter of our data likelihood to this covariate.
Where are the learnable parameters. This mean parameter
Note: There are many things we can improve. For example, we can increase the model complexity, e.g., a basis function or a nonlinear model. We could also add more covariates.
def mean_function(
x: Array["Dx"], weight: Array["Dy Dx"], bias: Array["Dy"]
) -> Array["Dy"]:
return weight @ x + bias
def model(x: Array["Dx"], y: Array["Dy"]=None):
# prior parameters
weight: Array["Dy Dx"] = sample("weight", dist.Normal(0.0, 1.0))
bias: Array["Dy"] = sample("bias", dist.Normal(0.0, 1.0))
# process parameterizations
mu: Array["Dy"] = mean_function(x, weight, bias)
sigma: Array["Dy"] = sample("sigma", dist.Normal(0.0, 10.0))
# data likelihood
return sample("y", dist.Normal(mu, sigma), obs=y)
Prior Predictive Posterior¶
# initialize predictive utility
N = 1_000
predictive = Predictive(model, num_samples=N)
# make predictions
y_pred: Array["N"] = predictive(rng_key, X)["y"]
Inference¶
kernel = NUTS(model)
num_samples = 2_000
mcmc = MCMC(kernel, num_warmup=1_000, num_samples=num_samples)
mcmc.run(key, x=x, y=y)
mcmc.print_summary()
Posterior Samples¶
# extract posterior samples
posterior_samples: Dict = mcmc.get_samples()
# extract individual parameter samples
alpha: Array["Np Dalpha"] = posterior_samples["alpha"]
theta: Array["Np Dtheta"] = posterior_samples["theta"]
Posterior Predictive¶
# initialize predictive utility
predictive = Predictive(model, posterior_samples=posterior_samples, return_sites=["y"])
# make predictions
X_pred: Array["Nx"] = ...
y_pred: Array["Np"] = predictive(rng_key, X_pred)["y"]
- Philip, S., Kew, S., van Oldenborgh, G. J., Otto, F., Vautard, R., van der Wiel, K., King, A., Lott, F., Arrighi, J., Singh, R., & van Aalst, M. (2020). A protocol for probabilistic extreme event attribution analyses. Advances in Statistical Climatology, Meteorology and Oceanography, 6(2), 177–203. 10.5194/ascmo-6-177-2020
- Sartini, L., Cassola, F., & Besio, G. (2015). Extreme waves seasonality analysis: An application in the <scp>M</scp>editerranean <scp>S</scp>ea. Journal of Geophysical Research: Oceans, 120(9), 6266–6288. 10.1002/2015jc011061
- Jonathan, P., & Ewans, K. (2013). Statistical modelling of extreme ocean environments for marine design: A review. Ocean Engineering, 62, 91–109. 10.1016/j.oceaneng.2013.01.004