Skip to article frontmatterSkip to article content

Ensemble Kalman Filter

CNRS
MEOM

Notes for the Ensemble Kalman Filter (EnsKF).


Model

x(t)=Mx(t1)+η(t),ηN(0,Q)y(t)=Hx(t)+ϵ(t),ϵN(0,R)\begin{aligned} \mathbf{x}(t) &= \mathbf{Mx}(t-1) + \boldsymbol{\eta}(t), &\boldsymbol{\eta} \sim \mathcal{N}(\mathbf{0}, \mathbf{Q}) \\ \mathbf{y}(t) &= \mathbf{Hx}(t) + \boldsymbol{\epsilon}(t), &\boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{R}) \\ \end{aligned}

Forecast Step

xf(t)=Mxa(t1)Pf(t)=MPa(t1)M+Q\begin{aligned} \mathbf{x}^f(t) &= \mathbf{Mx}^a(t-1) \\ \mathbf{P}^f(t) &= \mathbf{MP}^a(t-1) \mathbf{M}^\top + \mathbf{Q} \end{aligned}

Analysis Step (Forward)

xa(t)=xf(t)+Ka[y(t)Hxf(t)]Pa(t)=[IKaH]Pf(t)\begin{aligned} \mathbf{x}^a(t) &= \mathbf{x}^f(t) + \mathbf{K}^a[ \mathbf{y}(t) - \mathbf{H}\mathbf{x}^f(t)] \\ \mathbf{P}^a(t) &= [ \mathbf{I} - \mathbf{K}^a \mathbf{H}]\mathbf{P}^f(t) \end{aligned}

where:

Ka=Pf(t)H[HPf(t)H+R]1\mathbf{K}^a = \mathbf{P}^f(t) \mathbf{H}^\top [ \mathbf{HP}^f(t)\mathbf{H}^\top + \mathbf{R}]^{-1}

Analysis Step (Backwards)

xs(t)=xa(t)+Ks[xs(t+1)xf(t+1)]Ps(t)=Pa(t)+Ks[Ps(t+1)Pf(t+1)](Ks)\begin{aligned} \mathbf{x}^s(t) &= \mathbf{x}^a(t) + \mathbf{K}^s[ \mathbf{x}^s(t+1) - \mathbf{x}^f(t+1)] \\ \mathbf{P}^s(t) &= \mathbf{P}^a(t) + \mathbf{K}^s[ \mathbf{P}^s(t+1) - \mathbf{P}^f(t+1)](\mathbf{K}^s)^\top \end{aligned}

where:

Ks=Pa(t)M[Pf(t+1)]1\mathbf{K}^s = \mathbf{P}^a(t) \mathbf{M}^\top [ \mathbf{P}^f (t+1)]^{-1}

Model

In man applications, there are cases where we cannot write the transition model and/or the emission model in a simple matrix operation. Instead, we use non-linear functions.

x(t)=m(x,t1)+η(t)y(t)=h(x,t)+ϵ(t)\begin{aligned} \mathbf{x}(t) &= \boldsymbol{m}(\mathbf{x}, t-1) + \boldsymbol{\eta}(t) \\ \mathbf{y}(t) &= \boldsymbol{h}(\mathbf{x}, t) + \boldsymbol{\epsilon}(t) \end{aligned}

This gives us added flexibility and this also allows us to model more complexities found within the data. However, we cannot use the linear Kalman filter equations for inference.

Solutions: In general, you’ll find 3 classes of algorithms to deal with this: Extended Kalman Filters (EKF), Ensemble Kalman Filters (EnKF) and Particle Filters (PF).


Ensemble Kalman Filter

In the EnKF, we use an ensemble to track the state of the system. So x(t)\mathbf{x}(t), is approximated by NeN_e members. These members are used to track the evolution of the state. Then the empirical mean and covariance is calculated followed by the Kalman filter equations to update the state based on observations. This provides a robust and fast solution.

x(t)=(x1(t),x2(t),,xNe(t))\mathbf{x}(t) = (\mathbf{x}_1(t), \mathbf{x}_2(t), \ldots, \mathbf{x}_{N_e}(t))
x(i)f(t)=m(x(i)a(t1))+η(i)(t),i=1,2,,Ne\mathbf{x}_{(i)}^f(t) = \boldsymbol{m}(\mathbf{x}_{(i)}^a(t-1)) + \boldsymbol{\eta}_{(i)}(t), \hspace{5mm} \forall i=1,2, \ldots, N_e
K=Pf(t)H(HPf(t)H+R)1\mathbf{K} = \mathbf{P}^f(t) \mathbf{H}^\top (\mathbf{H} \mathbf{P}^f(t) \mathbf{H}^\top + \mathbf{R})^{-1}

Forecast Step

x(i)f(t)=m(x(i)a(t1))+η(i)(t),i=1,2,,Nexf(t)=1Nei=1Nex(i)(t)Pf(t)=1Ne1i=1Ne(x(i)(t)xˉ(t))(x(i)(t)xˉ(t))K=Pf(t)H(HPf(t)H+R)\begin{aligned} \mathbf{x}_{(i)}^f(t) &= \boldsymbol{m}(\mathbf{x}_{(i)}^a(t-1)) + \boldsymbol{\eta}_{(i)}(t), &\forall i=1,2, \ldots, N_e \\ \mathbf{x}^f(t) &= \frac{1}{N_e}\sum_{i=1}^{N_e} \mathbf{x}_{(i)}(t) \\ \mathbf{P}^f(t) &= \frac{1}{N_e - 1} \sum_{i=1}^{N_e} \left( \mathbf{x}_{(i)}(t) - \bar{\mathbf{x}}(t) \right)\left( \mathbf{x}_{(i)}(t) - \bar{\mathbf{x}}(t) \right)^\top \\ \mathbf{K} &= \mathbf{P}^f(t)\mathbf{H}^\top \left( \mathbf{HP}^f(t)\mathbf{H}^\top + \mathbf{R} \right) \end{aligned}

Deterministic Ensemble Kalman Filter
xa(t)=xf(t)+K(y(t)h(xf(t)))Pa(t)=(IKH)Pf(t)\begin{aligned} \mathbf{x}^a(t) &= \mathbf{x}^f(t) + \mathbf{K}(\mathbf{y}(t) - \boldsymbol{h}(\mathbf{x}^f(t))) \\ \mathbf{P}^a(t) &= \left( \mathbf{I} - \mathbf{KH} \right)\mathbf{P}^f(t) \end{aligned}

Stochastic Ensemble Kalman Filter
y(i)f(t)=h(x(i)f(t))+ϵ(i)(t)x(i)a(t)=x(i)f(t)+K(y(t)y(i)f(t))xa(t)=1Nei=1Nex(i)a(t)Pa(t)=1Ne1i=1Ne(x(i)a(t)xa(t))(x(i)a(t)xa(t))\begin{aligned} \mathbf{y}^f_{(i)}(t) &= \boldsymbol{h}(\mathbf{x}_{(i)}^f(t)) + \boldsymbol{\epsilon}_{(i)}(t)\\ \mathbf{x}^a_{(i)}(t) &= \mathbf{x}_{(i)}^f(t) + \mathbf{K}\left( \mathbf{y}(t) - \mathbf{y}^f_{(i)}(t) \right) \\ \mathbf{x}^a(t) &= \frac{1}{N_e}\sum_{i=1}^{N_e} \mathbf{x}_{(i)}^a(t) \\ \mathbf{P}^a(t) &= \frac{1}{N_e - 1} \sum_{i=1}^{N_e} \left( \mathbf{x}_{(i)}^a(t) - \mathbf{x}^a(t) \right)\left( \mathbf{x}_{(i)}^a(t) - \mathbf{x}^a(t) \right)^\top \\ \end{aligned}

Computational Issues

Sherman-Morrison Woodbury

(AB(C+BAB))1=(A1+BCB)1BC1(AB^\top (C + BAB^\top))^{-1} = (A^{-1} + B^\top CB)^{-1}B^\top C^{-1}

Kalman Gain (Observation State)

K=PH(HPH+R)1\mathbf{K} = \mathbf{PH}^\top (\mathbf{HPH}^\top + \mathbf{R})^{-1}

Reduced Space

K=S(I+(HS)R1HS)1(HS)R1\mathbf{K} = \mathbf{S}(\mathbf{I} + (\mathbf{HS})^\top \mathbf{R}^{-1}\mathbf{HS})^{-1}(\mathbf{HS})^\top \mathbf{R}^{-1}

where S=xN1\mathbf{S} = \frac{\mathbf{x}}{\sqrt{N-1}}.

Inverting Symmetric Definite Matrices

Standard

C_inv = linalg.pinv(C, hermitian=True)

SVD

P=UΛVP1=VΛ1U\begin{aligned} \mathbf{P} &= \mathbf{U\Lambda V}^\top \\ \mathbf{P}^{-1} &= \mathbf{V} {\Lambda}^{-1} \mathbf{U}^\top \end{aligned}
U, D, VH = linalg.svd(C, full_matrices=False, hermitian=True)
C_inv = VH.T @ diag(1/D) @ U.T

Note: This is useful for data assimilation with square roots

Cholesky

P=LLLU=ILP1=U\begin{aligned} \mathbf{P} &= \mathbf{LL}^\top \\ \mathbf{LU} &= \mathbf{I} \\ \mathbf{L}^\top \mathbf{P}^{-1} &= \mathbf{U} \end{aligned}
L = linalg.cholesky(C)
U = solve_triangular(L, eye(L.shape[1]), lower=True)
C_inv = solve_triangular(L.transpose(1, 2), U, lower=False)

Note: This is useful for optimal interpolation.

Analysis Step (SVD)

Forecast Ensemble Anomalies

Sf=1N1(xfxˉf)\mathbf{S}^f = \frac{1}{N-1}\left( \mathbf{x}^f - \bar{\mathbf{x}}^f \right)

Forecast Ensemble Covariance

Pf=SfSf\mathbf{P}^f = \mathbf{S}^f\mathbf{S}^{f^\top}

Kalman Gain (Ensemble Space)

K=Sf(I+(HS)R1HS)1(HSf)R1\mathbf{K} = \mathbf{S}^f\left( \mathbf{I} + (\mathbf{HS})^\top \mathbf{R}^{-1}\mathbf{HS} \right)^{-1}( \mathbf{HS}^f)^\top \mathbf{R}^{-1}
S @ inv(eye(Ne) + (H @ S).T @ inv(R) @ H @ S) @ (H @ S).T @ inv(R)

[Ns x D] = [Ns x D] @ []

SVD of Signal/Noise

(HS)R1HS=UΛU(\mathbf{HS})^\top\mathbf{R}^{-1}\mathbf{HS} = \mathbf{U\Lambda U}^\top

Kalman Gain with SVD

K=SfU(I+Λ)1/2U\mathbf{K} = \mathbf{S}^f \mathbf{U}(\mathbf{I}+\mathbf{\Lambda})^{-1/2}\mathbf{U}^\top

Analysis Ensemble of Anomalies

Sa=SfU(I+Λ)1/2U\mathbf{S}^a = \mathbf{S}^f \mathbf{U}(\mathbf{I} + \mathbf{\Lambda})^{-1/2} \mathbf{U}^\top

Analysis Ensemble of Covariances

Pa=PfKHPf=SaSa  \mathbf{P}^a = \mathbf{P}^f - \mathbf{KHP}^f = \mathbf{S}^a\mathbf{S}^{a\;\top}

Analysis Mean

xˉa=xˉf+K(yobsHxˉf)\bar{\mathbf{x}}^a = \bar{\mathbf{x}}^f + \mathbf{K}(\mathbf{y}_\text{obs} - \mathbf{H}\bar{\mathbf{x}}^f)

Analysis Ensemble

xa=xˉa+N1  Sa\mathbf{x}^a = \bar{\mathbf{x}}^a + \sqrt{N - 1}\;\mathbf{S}^a

Perturbed Observations

  • Sequential Data Assimilation with Non-Linear Quasi-Geostrophic Model using Monte Carlo methods to forecast error statistics - Evensen (1994)
  • Data Assimilation using Ensemble Kalman Filter Technique - Houtekamer & Mitchell (1998)

Inputs

We have some ensemble members.

  • XeRNe×Dx\mathbf{X}_{e} \in \mathbb{R}^{N_e \times D_x}
  • yRDy\mathbf{y} \in \mathbb{R}^{D_y}

Sample the Observations given the state

Here, we propagate these samples through our emission function, H\boldsymbol{H}, to give us some predictions for the observations.

Ye=H(Xe;θ)\begin{aligned} \mathbf{Y}_e &= \boldsymbol{H}(\mathbf{X}_e; \boldsymbol{\theta}) \end{aligned}

where YeRNe×Dy\mathbf{Y}_e \in \mathbb{R}^{N_e \times D_y}.

# emission function
Y_ens = H(X_ens, t, rng)

[Ns x Dy]


Calculate 1st Moment

Now we need to calculate the 1st moment (the mean) to approximate the Gaussian distribution. We need to do this for both the predicted state, Xe\mathbf{X}_e, and the predicted observations, Ye\mathbf{Y}_e.

xˉ=1NeiNeXe  (i)yˉ=1NeiNeYe  (i)\begin{aligned} \bar{\mathbf{x}} &= \frac{1}{N_e}\sum_{i}^{N_e} \mathbf{X}_{e\;(i)} \\ \bar{\mathbf{y}} &= \frac{1}{N_e}\sum_{i}^{N_e} \mathbf{Y}_{e\;(i)} \\ \end{aligned}

where xˉRD\bar{\mathbf{x}} \in \mathbb{R}^{D}


Now we need to calculate the 1st and 2nd moments (mean, covariance) to approximate the Gaussian distribution. We need to do this for both the predicted state, Xe\mathbf{X}_e, and the predicted observations, Ye\mathbf{Y}_e.

xˉ=Xe\begin{aligned} \bar{\mathbf{x}} = \mathbf{X}_e - \end{aligned}
# mean of state ensembles (Dx)
x_mu = mean(X_ens)
# mean of obs ensembles (Dy)
y_mu = mean(Y_ens)

Calculate Residuals

We calculate the errors (residuals) for our ensemble of predictions, Ye\mathbf{Y}_e, wrt to the true observation value, yobs\mathbf{y}_\text{obs}.


Resources

  • Auto-differentiable Ensemble Kalman Filters - Chen et al (2021) - arxiv | PyTorch