Abstract¶ In computer science, we often can “bin” all problems into a series of sub-problems that we
can iterate over until convergence.
The same can be said for data-driven geoscience!
In fact, I claim that most problems can be put into a series of 4 categories: 1) data acquisition,
2) learning, 3) estimating, and/or 4) predictions.
Keywords: tasks learning estimation predictions ¶ Overview ¶ Data Acquisition
Learn
Estimate
Predict
Data ¶ D = { x n , y n , z n ∗ , } n = 1 N \mathcal{D} = \left\{ \mathbf{x}_n, \mathbf{y}_n, \mathbf{z}_n^*, \right\}_{n=1}^N D = { x n , y n , z n ∗ , } n = 1 N Measurements : y n ∈ Y ⊆ R D y Covariates : x n ∈ X ⊆ R D x Simulationed States : z n s i m ∈ Z s i m ⊆ R D z Reanalysis States : z n ∗ ∈ Z ∗ ⊆ R D z \begin{aligned}
\text{Measurements}: && &&
\mathbf{y}_n &\in\mathcal{Y}\subseteq\mathbb{R}^{D_y} \\
\text{Covariates}: && &&
\mathbf{x}_n &\in\mathcal{X}\subseteq\mathbb{R}^{D_x} \\
\text{Simulationed States}: && &&
\mathbf{z}_n^{sim} &\in\mathcal{Z}^{sim}\subseteq\mathbb{R}^{D_z} \\
\text{Reanalysis States}: && &&
\mathbf{z}_n^{*} &\in\mathcal{Z}^{*}\subseteq\mathbb{R}^{D_z} \\
\end{aligned} Measurements : Covariates : Simulationed States : Reanalysis States : y n x n z n s im z n ∗ ∈ Y ⊆ R D y ∈ X ⊆ R D x ∈ Z s im ⊆ R D z ∈ Z ∗ ⊆ R D z Learning ¶ I have data, D \mathcal{D} D , which captures the phenomena that I want to learn.
I want to learn a model, f f f , with the associated parameters, θ \theta θ , give the data, D \mathcal{D} D .
θ ∗ = arg min θ L ( θ ; D ) \boldsymbol{\theta}^* = \underset{\boldsymbol{\theta}}{\argmin}
\hspace{2mm}
\boldsymbol{L}(\boldsymbol{\theta};\mathcal{D}) θ ∗ = θ arg min L ( θ ; D ) where L ( ⋅ ) \boldsymbol{L}(\cdot) L ( ⋅ ) is our loss function.
L : R D θ × D → R \begin{aligned}
\boldsymbol{L} : \mathbb{R}^{D_\theta} \times \mathcal{D} \rightarrow \mathbb{R}
\end{aligned} L : R D θ × D → R Estimation ¶ I have a model, f f f , and parameters, θ \theta θ .
I have some measurements, y y y .
I want to estimate a state, z z z .
z ∗ ( θ ) = arg min z J ( z ; θ , D ) \mathbf{z}^*(\boldsymbol{\theta}) = \underset{\mathbf{z}}{\argmin}
\hspace{2mm}
\boldsymbol{J}(\mathbf{z};\boldsymbol{\theta},\mathcal{D}) z ∗ ( θ ) = z arg min J ( z ; θ , D ) where J ( ⋅ ) \boldsymbol{J}(\cdot) J ( ⋅ ) is our objective function defined as:
J : R D z × R D θ × D → R \begin{aligned}
\boldsymbol{J} : \mathbb{R}^{D_z} \times \mathbb{R}^{D_\theta}\times\mathcal{D} \rightarrow \mathbb{R}
\end{aligned} J : R D z × R D θ × D → R Parameter & State Estimation ¶ Parameter Estimation : θ ∗ = arg min θ L ( θ ; D ) State Estimation : z ∗ ( θ ) = arg min z J ( z ; θ , D ) \begin{aligned}
\text{Parameter Estimation}: && &&
\boldsymbol{\theta}^* = \underset{\boldsymbol{\theta}}{\argmin}
\hspace{2mm}
\boldsymbol{L}(\boldsymbol{\theta};\mathcal{D}) \\
\text{State Estimation}: && &&
\mathbf{z}^*(\boldsymbol{\theta}) = \underset{\mathbf{z}}{\argmin}
\hspace{2mm}
\boldsymbol{J}(\mathbf{z};\boldsymbol{\theta},\mathcal{D})
\end{aligned} Parameter Estimation : State Estimation : θ ∗ = θ arg min L ( θ ; D ) z ∗ ( θ ) = z arg min J ( z ; θ , D ) This is akin to the:
Prediction ¶ I have my model, parameters, and state estimation.
I want to make a prediction for my QoI, u u u .
u ∗ = f ( z ∗ , θ ) u^* = \boldsymbol{f}(\mathbf{z}^*, \boldsymbol{\theta}) u ∗ = f ( z ∗ , θ ) In this case, we never have access to any sort of validation, u u u .
We are simply making a prediction.