Context¶
We consider some observations, , which are multi-dimensional.
Let’s say we have a group of observations:
We are interested in finding a mapping
Joint Distribution
Model¶
Probabilistic PCA: Assumes a scalar value for the covariance
Factor Analysis: Assumes a diagonal matrix
Uniqueness¶
- Enforce that rows of have a norm of 1
- Enforce that rows of are orthogonal
- Fit the rows of sequentially
Minimization Problem¶
Least Squares
PsuedoCode I¶
Here, we will do this from scratch.
1 - Data¶
# get data
y: Dataset["lat lon time"] = ...
# stack dimensions
y: Dataset["space time"] = y.stack(sample=["lat" "lon"])
2 - Temporal vs Spatial EOFs¶
Recall that T-Mode maximizes the spatial variance and S-Mode maximizes the temporal variance.
# choose sample vs dimensions
y: Array["N D"] = y.rename({"space": "N", "time": "D"})
3 - SVD Decomposition¶
Using the Singular Value Decomposition (SVD), we can decompose the matrix, , like so:
where is the spatial EOFs, is the temporal EOFs, and are the single values.
4 - Eigenvalue Decomposition¶
Alternatively, we can decompose the matrix using the covariance matrix :
# covariance in space
cov_space: Array["D D"] = cov(Y, rowvar=True)
# covariance in time
cov_time: Array["T T"] = cov(Y, rowvar=False)
Once this is done, we can do the eigenvalue decomposition of this:
We can do this in either case.
However, we must be careful because we can get gigantic matrices.
For example, a spatiotemporal field that’s 200x200x10
will result in a matrix of .
Subsequently, we will result get a covariance matrix of for the spatial components and for the temporal components.
So combat this, we will use the eigenvalue decomposition on the cheapest covariance, i.e., the temporal covariance matrix, . From this, we can calculate the temporal EOFs, , and the eigenvalues, . From this, we can recover the spatial EOFs, . This is possible because we can do some matrix multiplication tricks on the SVD decomposition to isolate the EOFs.
# (SxT) = (Sxd)(dxd)(dxT)
# (TxS) = (Txd)(dxd)(dxS)
# (TxS)(Sxd) = (Txd)(dxd)
# (TxS) = (SxT)(Txd)(dxd)
U, S, V = svd(X)
V_, S_ = eig(X.T @ X)
np.testing.assert()
4 -¶
PsuedoCode II¶
Prior¶
# parameters
loc: Array["Dx"] = ...
scale: Array["Dx"] = ...
W: Array["Dx Dz"] = sample("w", dist.Normal(loc, scale))
noise: Array[""] = sample("noise", dist.Normal(0.0, 1.0))
Likelihood¶
We will use a low-rank multivariate likelihood.
D: Array["Dz"] = noise * eye_like(W)
# likelihood
obs = sample("obs", dist.LowRankMVN(loc=0.0, cov_factor=W, cov_diag=D), obs="y")
PseudoCode II¶
- Resource - jupyter nb | Edward | TFP
Data¶
# get data
y: Dataset["lat lon time"] = ...
# stack dimensions
y: Dataset["space time"] = y.stack(sample=["lat" "lon"])
# choose sample vs dimensions
y: Array["N D"] = y.rename({"space": "N", "time": "D"})
Model¶
Dz = 10
def model(obs: Array["N Dy"]):
# calculate alpha, prior for weights
rate: Array["Dz"] = ones(shape=(Dz,))
alpha: Array["Dz"] = sample("alpha", dist.Exponential(rate=rate).to_event(1))
# calculate W
loc: Array["Dz"] = zeros(shape=(Dy, Dz))
scale: Array["Dy Dy"] = einx.rearrange("Dz -> Dy Dz", alpha, Dz=Dz)
W: Array["Dx Dz"] = sample("W", dist.Normal(loc, scale).to_event(1))
# scale
sigma: Array[""] = sample("sigma", dist.HalfCauchy(scale=1))
# TODO: add bias...
# different batches
with numpyro.plate("samples", N):
# latent variable
loc = ones(shape=(Dz,))
scale = ones(shape=(Dz,))
z: Array["Dz"] = sample("z", dist.Normal(loc, scale).to_event(1))
# calculate mean
loc: Array["N Dy"] = einx.dot("N Dy, Dy Dz", W, z)
# calculate scale
scale: Array["Dy"] = sigma * eye(shape=(Dy,))
y = sample.("y", dist.Normal(loc=loc, scale=scale).to_event(1), obs=obs)
Inference¶
MCMC Sampling¶
kernel = numpyro.infer.NUTS(model)
mcmc = numpyro.infer.MCMC(kernel, num_warmup=1000, num_samples=10000, thinning=5, num_chains=1)
mcmc.run(jax.random.PRNGKey(1), obs=y)
Variational Inference¶
guide = AutoNormal(model)