Spaces ¶ Notation Description N ∈ N N \in \mathbb{N} N ∈ N the number of samples (natural number) D ∈ N D \in \mathbb{N} D ∈ N the number of features/covariates (natural number)
Variables ¶ Notation Description x , y ∈ R x,y \in \mathbb{R} x , y ∈ R scalers (real numbers) x ∈ R D x \mathbf{x} \in \mathbb{R}^{D_\mathbf{x}} x ∈ R D x a D x D_\mathbf{x} D x -dimensional column vector, usually the input. y ∈ R D y \mathbf{y} \in \mathbb{R}^{D_\mathbf{y}} y ∈ R D y a D y D_\mathbf{y} D y -dimensional column vector, usually the output. x j ∈ R x^j \in \mathbb{R} x j ∈ R the j j j -th feature from a vector, x ∈ R D \mathbf{x} \in \mathbb{R}^{D} x ∈ R D , where ( x j ) 1 ≤ j ≤ D (x^j)_{1\leq j \leq D} ( x j ) 1 ≤ j ≤ D x i ∈ R x_i \in \mathbb{R} x i ∈ R the i i i -th sample from a vector, x ∈ R N \mathbf{x} \in \mathbb{R}^{N} x ∈ R N , where ( x i ) 1 ≤ j ≤ D (x_i)_{1\leq j \leq D} ( x i ) 1 ≤ j ≤ D X ∈ R N × D \mathbf{X} \in \mathbb{R}^{N \times D} X ∈ R N × D a collection of N N N input vectors, X = [ x 1 , … , x N ] ⊤ \mathbf{X}=[\mathbf{x}_1, \ldots, \mathbf{x}_N]^\top X = [ x 1 , … , x N ] ⊤ , where x ∈ R D \mathbf{x} \in \mathbb{R}^{D} x ∈ R D Y ∈ R N × P \mathbf{Y} \in \mathbb{R}^{N \times P} Y ∈ R N × P a collection of N N N output vectors, Y = [ y 1 , … , y N ] ⊤ \mathbf{Y}=[\mathbf{y}_1, \ldots, \mathbf{y}_N]^\top Y = [ y 1 , … , y N ] ⊤ , where y ∈ R P \mathbf{y} \in \mathbb{R}^{P} y ∈ R P x j ∈ R N \mathbf{x}^{j} \in \mathbb{R}^{N} x j ∈ R N the j j j -th feature from a collection of vectors, X \mathbf{X} X , where ( x j ) 1 ≤ j ≤ D (\mathbf{x}^{j} )_{1\leq j \leq D} ( x j ) 1 ≤ j ≤ D x i ∈ R D \mathbf{x}_{i} \in \mathbb{R}^{D} x i ∈ R D the i i i -th sample from a collection of vectors, X \mathbf{X} X , where ( x i ) 1 ≤ i ≤ N (\mathbf{x}_{i})_{1\leq i \leq N} ( x i ) 1 ≤ i ≤ N x i j ∈ R x_{i}^j \in \mathbb{R} x i j ∈ R the i i i -th sample and j j j -th feature from a collection of vectors, X \mathbf{X} X , where ( x i j ) 1 ≤ i ≤ N , 1 ≤ j ≤ D (\mathbf{x}_{i}^{j})_{1\leq i \leq N,1\leq j \leq D} ( x i j ) 1 ≤ i ≤ N , 1 ≤ j ≤ D
Functions ¶ Notation Description f : X → Y f : \mathcal{X} \rightarrow \mathcal{Y} f : X → Y a latent function that operates on a scaler and maps a space X \mathcal{X} X to a space Y \mathcal{Y} Y f : X → Y \boldsymbol{f} : \mathcal{X} \rightarrow \mathcal{Y} f : X → Y a latent function that operates on a vector and maps a space X \mathcal{X} X to a space Y \mathcal{Y} Y f ( ⋅ ; θ ) \boldsymbol{f}(\;\cdot\;;\boldsymbol{\theta}) f ( ⋅ ; θ ) a latent function parameterized by θ \boldsymbol{\theta} θ f θ ( ⋅ ) \boldsymbol{f}_{\boldsymbol \theta}(\cdot) f θ ( ⋅ ) a latent function parameterized by θ \boldsymbol{\theta} θ (succinct version) k ( ⋅ , ⋅ ) \boldsymbol{k}(\cdot, \cdot) k ( ⋅ , ⋅ ) kernel or covariance function
Below, we have some specificities for these functions and how they translate to real situations.
Scalar Input - Scalar Output
f : R → R f: \mathbb{R} \rightarrow \mathbb{R} f : R → R Vector Input - Scalar Output
f : R D → R \boldsymbol{f}: \mathbb{R}^D \rightarrow \mathbb{R} f : R D → R Example : 1D Spatio-Temporal Scalar Field
y = f ( x ϕ , t ) y = \boldsymbol{f}(x_\phi, t) y = f ( x ϕ , t ) Example : 2D Spatial Scalar Field
We have a 2-dimensional scalar field. The coordinates, x ∈ R D ϕ \mathbf{x} \in \mathbb{R}^{D_\phi} x ∈ R D ϕ , are 2D, e.g. (lat,lon) coordinates D ϕ = [ ϕ , ψ ] D_\phi = [\phi, \psi] D ϕ = [ ϕ , ψ ] . Then each of these coordinates are represented by a scalar value, y ∈ R y \in \mathbb{R} y ∈ R . So we have a function, f \boldsymbol{f} f , maps each coordinate, x \mathbf{x} x , of the field to a scalar value, y y y , i.e. f : R D ϕ → R \boldsymbol{f}: \mathbb{R}^{D_\phi} \rightarrow \mathbb{R} f : R D ϕ → R . More explicitly, we can write this function as:
y = f ( x ϕ ) y = \boldsymbol{f}(\mathbf{x}_\phi) y = f ( x ϕ ) if we stack a lot of samples together, D = { x n , y n } n = 1 N \mathcal{D} = \left\{ \mathbf{x}_n, y_n\right\}_{n=1}^N D = { x n , y n } n = 1 N , we get a matrix for the coordinates, X \mathbf{X} X , and a vector for the scalar values, y \mathbf{y} y . So we have D = { X , y } \mathcal{D} = \left\{ \mathbf{X}, \mathbf{y}\right\} D = { X , y } .
Note : For more consistent and aesthetically pleasing notation, we have Y = y ⊤ \mathbf{Y} = \mathbf{y}^\top Y = y ⊤ so we can have the dataset, D = { X , Y } \mathcal{D} = \left\{ \mathbf{X}, \mathbf{Y}\right\} D = { X , Y }
Example : 2D Spatio-Temporal Scalar Field
y = f ( x ϕ , t ) y = \boldsymbol{f}(\mathbf{x}_\phi, t) y = f ( x ϕ , t ) Vector Input - Vector Output
f : R D → R P \boldsymbol{f}: \mathbb{R}^D \rightarrow \mathbb{R}^P f : R D → R P Example : 2D Vector Field
We have a 2-dimensional vector field (similar to the above example). The coordinates, x ∈ R D ϕ \mathbf{x} \in \mathbb{R}^{D_\phi} x ∈ R D ϕ , are 2D, e.g. (lat,lon) coordinates D ϕ = [ ϕ , ψ ] D_\phi = [\phi, \psi] D ϕ = [ ϕ , ψ ] . Then each of these coordinates are represented by a vector value, y ∈ R P \mathbf{y} \in \mathbb{R}^{P} y ∈ R P . In this case, let the dimensions be the (u,v) fields, i.e. P = [ u , v ] P=[u,v] P = [ u , v ] . So we have a function, f \boldsymbol{f} f , maps each coordinate, x \mathbf{x} x , of the field to a vector value, y y y , i.e. f : R D ϕ → R P \boldsymbol{f}: \mathbb{R}^{D_\phi} \rightarrow \mathbb{R}^{P} f : R D ϕ → R P . More explicitly, we can write this function as:
y = f ( x ) \mathbf{y} = \boldsymbol{f}(\mathbf{x}) y = f ( x ) Again, if we stack a lot of samples together, D = { x n , y n } n = 1 N \mathcal{D} = \left\{ \mathbf{x}_n, \mathbf{y}_n\right\}_{n=1}^N D = { x n , y n } n = 1 N , we get a stack of matrices, D = { X , Y } \mathcal{D} = \left\{ \mathbf{X}, \mathbf{Y}\right\} D = { X , Y } .
Special Case: D = P D = P D = P
f : R 2 → R 2 \boldsymbol{f}:\mathbb{R}^2 \rightarrow \mathbb{R}^2 f : R 2 → R 2 where each of the functions takes in a 2D vector, ( x , y ) (x,y) ( x , y ) , and outputs a vector, ( u , v ) (u, v) ( u , v ) . This is analagous to scalar field for u u u and v v v which appears in physics. So
f 1 ( x , y ) = u f 2 ( x , y ) = v \begin{aligned}
f_1(x,y) &= u \\
f_2(x,y) &= v
\end{aligned} f 1 ( x , y ) f 2 ( x , y ) = u = v We have our functional form given by:
f ( [ x y ] ) = [ f 1 ( x , y ) f 2 ( x , y ) ] = [ u v ] \mathbf{f}\left(
\begin{bmatrix}
x \\ y
\end{bmatrix}
\right) =
\begin{bmatrix}
f_1(x,y) \\ f_2(x,y)
\end{bmatrix} =
\begin{bmatrix}
u \\ v
\end{bmatrix} f ( [ x y ] ) = [ f 1 ( x , y ) f 2 ( x , y ) ] = [ u v ] Common Terms ¶ Notation Description θ a parameter θ α \theta_\alpha θ α a hyperparameter θ \boldsymbol{\theta} θ a collection of parameters, θ = [ θ 1 , θ 2 , … , θ p ] \boldsymbol{\theta}=[\theta_1, \theta_2, \ldots, \theta_p] θ = [ θ 1 , θ 2 , … , θ p ] θ α \boldsymbol{\theta_\alpha} θ α a collection of hyperparameters, θ α = [ θ α , 1 , θ α , 2 , … , θ α , p ] \boldsymbol{\theta_\alpha}=[\theta_{\alpha,1}, \theta_{\alpha,2}, \ldots, \theta_{\alpha,p}] θ α = [ θ α , 1 , θ α , 2 , … , θ α , p ]
Probability ¶ Notation Description X , Y \mathcal{X,Y} X , Y the space of data P , Q P,Q P , Q the probability space of data f X ( x ) f_\mathcal{X}(\mathbf{x}) f X ( x ) the probability density function (PDF ) on x \mathbf{x} x F X ( x ) F_\mathcal{X}(\mathbf{x}) F X ( x ) the cumulative density function (CDF ) on x \mathbf{x} x F X − 1 ( x ) F_\mathcal{X}^{-1}(\mathbf{x}) F X − 1 ( x ) the Quantile or Point Percentile Function (ppf) (i.e. inverse cumulative density function) on x \mathbf{x} x p ( x ; θ ) p(x;\theta) p ( x ; θ ) A probability distribution, p p p , of the variable x x x , parameterized by θ p θ ( x ) p_\theta(x) p θ ( x ) A probability distribution, p p p , of the variable x x x , parameterized by θ (succinct version) p ( x ; θ ) p(\mathbf{x};\boldsymbol{\theta}) p ( x ; θ ) A probability distribution, p, of the multidimensional variable, x \mathbf{x} x , parameterized by θ \boldsymbol{\theta} θ p θ ( x ) p_{\boldsymbol{\theta}}(\mathbf{x}) p θ ( x ) A probability distribution, p, of the multidimensional variable, x \mathbf{x} x , parameterized by θ \boldsymbol{\theta} θ (succinct version) N ( x ; μ , σ ) \mathcal{N}(x; \mu, \sigma) N ( x ; μ , σ ) A normal distribution for x x x parameterized by μ and σ. N ( x ; μ , Σ ) \mathcal{N}(\mathbf{x}; \boldsymbol{\mu}, \boldsymbol{\Sigma}) N ( x ; μ , Σ ) A multivariate normal distribution for x \mathbf{x} x parameterized by μ \boldsymbol{\mu} μ and Σ \boldsymbol{\Sigma} Σ . N ( 0 , I D ) \mathcal{N}(\mathbf{0}, \mathbf{I}_D) N ( 0 , I D ) A multivariate normal distribution with a zero mean and 1 variance.
Notation Description I ( X ) I(X) I ( X ) Self-Information for a rv X X X . H ( X ) H(X) H ( X ) Entropy of a rv X X X . T C ( X ) TC(X) TC ( X ) Total correlation (multi-information) of a rv X X X . H ( X , Y ) H(X,Y) H ( X , Y ) Joint entropy of rvs X X X and Y Y Y . I ( X , Y ) I(X,Y) I ( X , Y ) Mutual information between two rvs X X X and Y Y Y . D KL ( X , Y ) \text{D}_{\text{KL}}(X,Y) D KL ( X , Y ) Jullback-Leibler divergence between X X X and Y Y Y .
Gaussian Processes ¶ Notation Description m \boldsymbol{m} m mean function for a Gaussian process. K \mathbf{K} K kernel function for a Gaussian process G P ( m , K ) \mathcal{GP}(\boldsymbol{m}, \mathbf{K}) G P ( m , K ) Gaussian process distribution parameterized by a mean function, m \boldsymbol{m} m and kernel matrix, K \mathbf{K} K . μ G P \boldsymbol{\mu}_\mathcal{GP} μ G P GP predictive mean function. σ G P 2 \boldsymbol{\sigma}^2_\mathcal{GP} σ G P 2 GP predictive variance function. Σ G P \boldsymbol{\Sigma}_\mathcal{GP} Σ G P GP predictive covariance function.
Field Space ¶ The first case, we have
y = H ( x ) + ϵ \mathbf{y} = \boldsymbol{H}(\mathbf{x}) + \epsilon y = H ( x ) + ϵ This represents the state, x \mathbf{x} x , as a representation fo the field
x ∈ R D x \mathbf{x} \in \mathbb{R}^{D_x} x ∈ R D x - stateμ x ∈ R D x \boldsymbol{\mu}_{\mathbf{x}} \in \mathbb{R}^{D_x} μ x ∈ R D x - mean prediction for state vectorσ 2 x ∈ R D x \boldsymbol{\sigma^2}_{\mathbf{x}} \in \mathbb{R}^{D_x} σ 2 x ∈ R D x - variance prediction for state vectorX Σ ∈ R D x × D x \mathbf{X}_{\boldsymbol{\Sigma}} \in \mathbb{R}^{D_x \times D_x} X Σ ∈ R D x × D x - covariance prediction for state vectorX μ ∈ R N × D x \mathbf{X}_{\boldsymbol{\mu}} \in \mathbb{R}^{N \times D_x} X μ ∈ R N × D x - variance prediction for state vectorState (Coordinates) ¶ x ∈ R D ϕ \boldsymbol{x} \in \mathbb{R}^{D_\phi} x ∈ R D ϕ - the coordinate vectorμ x ∈ R D ϕ \boldsymbol{\mu}_{\boldsymbol{x}} \in \mathbb{R}^{D_\phi} μ x ∈ R D ϕ - mean prediction for state vectorσ 2 x ∈ R D x \boldsymbol{\sigma^2}_{\boldsymbol{x}} \in \mathbb{R}^{D_x} σ 2 x ∈ R D x - variance prediction for state vectorX Σ ∈ R D ϕ × D ϕ \boldsymbol{X}_{\boldsymbol{\Sigma}} \in \mathbb{R}^{D_\phi \times D_\phi} X Σ ∈ R D ϕ × D ϕ - covariance prediction for state vectorX μ ∈ R N × D ϕ \boldsymbol{X}_{\boldsymbol{\mu}} \in \mathbb{R}^{N \times D_\phi} X μ ∈ R N × D ϕ - variance prediction for state vectorObservations ¶ z ∈ R D z \mathbf{z} \in \mathbb{R}^{D_z} z ∈ R D z - latent domainy ∈ R D y \mathbf{y} \in \mathbb{R}^{D_y} y ∈ R D y - observationsMatrices ¶ Z ∈ R N × D z \mathbf{Z} \in \mathbb{R}^{N \times D_z} Z ∈ R N × D z - latent domain
X ∈ R N × D x \mathbf{X} \in \mathbb{R}^{N \times D_x} X ∈ R N × D x - state
Y ∈ R N × D y \mathbf{Y} \in \mathbb{R}^{N \times D_y} Y ∈ R N × D y - observations
Functions ¶ Coordinates ¶ In this case, we assume that the state, x ∈ R D ϕ \mathbf{x} \in \mathbb{R}^{D_\phi} x ∈ R D ϕ , are the coordinates, [ lat,lon,time ] [\text{lat,lon,time}] [ lat,lon,time ] , and the output is the value of the variable of interest, y \mathbf{y} y , at that point in space and time.
[ K ] i j = k ( x i , x j ) [\mathbf{K}]_{ij} = \boldsymbol{k}(\mathbf{x}_i, \mathbf{x}_j) [ K ] ij = k ( x i , x j ) - covariance matrix for the coordinatesk X ( x i ) = k ( X , x i ) \boldsymbol{k}_{\mathbf{X}}(\mathbf{x}_i) = \boldsymbol{k}(\mathbf{X}, \mathbf{x}_i) k X ( x i ) = k ( X , x i ) - cross covariance for the datak ( x i , x j ) : R D ϕ × R D ϕ → R \boldsymbol{k}(\mathbf{x}_i, \mathbf{x}_j) : \mathbb{R}^{D_\phi} \times \mathbb{R}^{D_\phi} \rightarrow \mathbb{R} k ( x i , x j ) : R D ϕ × R D ϕ → R - the kernel function applied to two vectors.Data Field ¶ In this case, we assume that the state, x \mathbf{x} x , is the input
[ C ] i j = c ( x i , x j ) [\mathbf{C}]_{ij} = \boldsymbol{c}(\mathbf{x}_i, \mathbf{x}_j) [ C ] ij = c ( x i , x j ) - covariance matrix for the data fieldOperators ¶ Jacobian ¶ So here, we’re talking about gradients and how they operate on functions.
Scalar Input-Output
f : R → R f: \mathbb{R} \rightarrow \mathbb{R} f : R → R There are no vectors in this operation so this is simply the derivative.
J f : R → R J f ( x ) = d f d x \begin{aligned}
J_f: \mathbb{R} &\rightarrow \mathbb{R} \\
J_f(x) &= \frac{df}{dx}
\end{aligned} J f : R J f ( x ) → R = d x df Vector Input, Scalar Output
f : R D → R \boldsymbol{f} : \mathbb{R}^D \rightarrow \mathbb{R} f : R D → R This has vector-inputs so the output dimension of the Jacobian operator will be the same dimensionality as the input vector.
J [ f ] ( x ) : R D → R D J f ( x ) = [ ∂ f ∂ x 1 ⋯ ∂ f ∂ x D ] \begin{aligned}
\boldsymbol{J}[\boldsymbol{f}](\mathbf{x}) &: \mathbb{R}^{D} \rightarrow \mathbb{R}^D \\
\mathbf{J}_{\boldsymbol{f}}(\mathbf{x}) &=
\begin{bmatrix}
\frac{\partial f}{\partial x_1} &\cdots \frac{\partial f}{\partial x_D}
\end{bmatrix}
\end{aligned} J [ f ] ( x ) J f ( x ) : R D → R D = [ ∂ x 1 ∂ f ⋯ ∂ x D ∂ f ] Vector Input, Vector Output
f ⃗ : R D → R P \vec{\boldsymbol{f}} : \mathbb{R}^D \rightarrow \mathbb{R}^P f : R D → R P The inputs are the vector, x ∈ R D \mathbf{x} \in \mathbb{R}^D x ∈ R D , and the outputs are a vector, y ∈ R P \mathbf{y} \in \mathbb{R}^P y ∈ R P . So the Jacobian operator will produce a matrix of size J ∈ R P × D \mathbf{J} \in \mathbb{R}^{P \times D} J ∈ R P × D .
J [ f ] ( x ) : R D → R P × D J [ f ] ( x ) = [ ∂ f 1 ∂ x 1 ⋯ ∂ f 1 ∂ x D … ⋱ … ∂ f p ∂ x 1 ⋯ ∂ f p ∂ x D ] \begin{aligned}
\boldsymbol{J}[{\boldsymbol{f}}](\mathbf{x}) &: \mathbb{R}^{D} \rightarrow \mathbb{R}^{P\times D}\\
\mathbf{J}[\boldsymbol{f}](\mathbf{x}) &=
\begin{bmatrix}
\frac{\partial f_1}{\partial x_1} &\cdots &\frac{\partial f_1}{\partial x_D} \\
\ldots &\ddots & \ldots \\
\frac{\partial f_p}{\partial x_1} &\cdots &\frac{\partial f_p}{\partial x_D}
\end{bmatrix}
\end{aligned} J [ f ] ( x ) J [ f ] ( x ) : R D → R P × D = ⎣ ⎡ ∂ x 1 ∂ f 1 … ∂ x 1 ∂ f p ⋯ ⋱ ⋯ ∂ x D ∂ f 1 … ∂ x D ∂ f p ⎦ ⎤ Alternative Forms
I’ve also seen alternative forms which depends on whether the authors want to highlight the inputs or the outputs.
Form I : Highlight the input vectors
J f ( x ) = [ ∂ f ∂ x 1 ⋯ ∂ f ∂ x D ] = [ ∇ f ∂ x 1 ⋯ ∇ f ∂ x D ] \mathbf{J}_{\boldsymbol{f}}(\mathbf{x}) =
\begin{bmatrix}
\frac{\partial \boldsymbol{f}}{\partial x_1} & \cdots & \frac{\partial \boldsymbol{f}}{\partial x_D}
\end{bmatrix} =
\begin{bmatrix}
\frac{\nabla \boldsymbol{f}}{\partial x_1} & \cdots & \frac{\nabla \boldsymbol{f}}{\partial x_D}
\end{bmatrix} J f ( x ) = [ ∂ x 1 ∂ f ⋯ ∂ x D ∂ f ] = [ ∂ x 1 ∇ f ⋯ ∂ x D ∇ f ] Form II : Highlights the output vectors
J f ( x ) = [ ∂ f 1 ∂ x ⋮ ∂ f p ∂ x ] = [ ∇ ⊤ f 1 ⋮ ∇ ⊤ f P ] \mathbf{J}_{\boldsymbol{f}}(\mathbf{x}) =
\begin{bmatrix}
\frac{\partial \boldsymbol{f}_1}{\partial \mathbf{x}} \\ \vdots \\ \frac{\partial \boldsymbol{f}_p}{\partial \mathbf{x}}
\end{bmatrix} =
\begin{bmatrix}
\boldsymbol{\nabla}^\top \boldsymbol{f}_1 \\ \vdots \\ \boldsymbol{\nabla}^\top \boldsymbol{f}_P
\end{bmatrix} J f ( x ) = ⎣ ⎡ ∂ x ∂ f 1 ⋮ ∂ x ∂ f p ⎦ ⎤ = ⎣ ⎡ ∇ ⊤ f 1 ⋮ ∇ ⊤ f P ⎦ ⎤ Special Cases ¶ There are probably many special cases where we have closed-form operators but I will highlight one here which comes up in physics a lot.
2D Vector Input, 2D Vector Output
Recall the special case from the above vectors where the dimensionality of the input vector, x ∈ R 2 \mathbf{x} \in \mathbb{R}^2 x ∈ R 2 , is the same dimensionality of the output vector, y ∈ R 2 \mathbf{y} \in \mathbb{R}^2 y ∈ R 2 .
f : R 2 → R 2 \begin{aligned}
\boldsymbol{f}&:\mathbb{R}^2 \rightarrow \mathbb{R}^2 \\
\end{aligned} f : R 2 → R 2 The functional form was:
f ( [ x y ] ) = [ f 1 ( x , y ) f 2 ( x , y ) ] = [ u v ] \mathbf{f}\left(
\begin{bmatrix}
x \\ y
\end{bmatrix}
\right) =
\begin{bmatrix}
f_1(x,y) \\ f_2(x,y)
\end{bmatrix} =
\begin{bmatrix}
u \\ v
\end{bmatrix} f ( [ x y ] ) = [ f 1 ( x , y ) f 2 ( x , y ) ] = [ u v ] So in this special case, our Jacobian matrix, J \mathbf{J} J , will be:
J f ( x , y ) = [ ∂ u ∂ x ∂ u ∂ y ∂ v ∂ x ∂ v ∂ y ] \mathbf{J}_{\boldsymbol{f}(x,y)} =
\begin{bmatrix}
\frac{\partial u}{\partial x} & \frac{\partial u}{\partial y} \\
\frac{\partial v}{\partial x} & \frac{\partial v}{\partial y}
\end{bmatrix} J f ( x , y ) = [ ∂ x ∂ u ∂ x ∂ v ∂ y ∂ u ∂ y ∂ v ] Note : This is a square matrix because the dimension of the input vector, ( x , y ) (x,y) ( x , y ) , matches the dimension of the output vector, ( u , v ) (u,v) ( u , v ) .
Determinant Jacobian ¶ The determinant of the Jacobian is the amount of (volumetric) change. It is given by:
det J f ( x ) : R D → R \det \boldsymbol{J}_{\boldsymbol{f}}(\mathbf{x}): \mathbb{R}^D \rightarrow \mathbb{R} det J f ( x ) : R D → R Notice how we input the vectors, x \mathbf{x} x , and it results in a scalar, R \mathbb{R} R .
Note : This can be a very expensive operation especially with high dimensional data. A naive linear function, f ( x ) = A x \boldsymbol{f}(\mathbf{x}) = \mathbf{Ax} f ( x ) = Ax , will have an operation of O ( D 3 ) \mathcal{O}(D^3) O ( D 3 ) . So the name of the game is to try and look at the Jacobian structure and find tricks to reduce the expense of the calculation.
Special Case: Input Vector 2D, Output Vector - 2D
Again, let’s go back to the special case where we have a two input vector, x ∈ R 2 \mathbf{x}\in \mathbb{R}^2 x ∈ R 2 , and a 2D output vector, y ∈ R 2 \mathbf{y} \in \mathbb{R}^2 y ∈ R 2 . Recall that the Jacobian matrix for the function, f \boldsymbol{f} f , is a 2 × 2 2\times 2 2 × 2 square matrix. More generally, we can write this as:
J [ A ( x , y ) B ( x , y ) ] = [ ∂ A ∂ x ∂ A ∂ y ∂ B ∂ x ∂ B ∂ y ] \boldsymbol{J}
\begin{bmatrix}
A(x,y) \\
B(x,y)
\end{bmatrix} =
\begin{bmatrix}
\frac{\partial A}{\partial x} & \frac{\partial A}{\partial y} \\
\frac{\partial B}{\partial x} & \frac{\partial B}{\partial y}
\end{bmatrix} J [ A ( x , y ) B ( x , y ) ] = [ ∂ x ∂ A ∂ x ∂ B ∂ y ∂ A ∂ y ∂ B ] To calculate the determinant of this Jacobian matrix, we has a closed-form expression. It’s given by:
det J = A D − B C \det \mathbf{J} = AD - BC det J = A D − BC So if we apply it to our notation
det J f ( x , y ) = ∂ f 1 ∂ x ∂ f 2 ∂ y − ∂ f 1 ∂ y ∂ f 2 ∂ x \det \mathbf{J}_{\mathbf{f}}(x,y) = \frac{\partial f_1}{\partial x}\frac{\partial f_2}{\partial y} - \frac{\partial f_1}{\partial y}\frac{\partial f_2}{\partial x} det J f ( x , y ) = ∂ x ∂ f 1 ∂ y ∂ f 2 − ∂ y ∂ f 1 ∂ x ∂ f 2 This is probably the easiest determinant Jacobian to calculate (apart from the scalar-valued which is simply the gradient) and it comes up from time to time in physics.
Note : I have seem an alternaive form in the geoscience literature, J ( f 1 , f 2 ) \boldsymbol{J}(\boldsymbol{f}_1, \boldsymbol{f}_2) J ( f 1 , f 2 ) . I personally don’t like this notation because in no way does it specify the determinant . I propose a better, clearer notation: det J ( f 1 , f 2 ) \det \boldsymbol{J}(\boldsymbol{f}_1, \boldsymbol{f}_2) det J ( f 1 , f 2 ) . Now we at least have the
Example : This is in the QG PDE. It is given by:
∂ t q + J ( ψ , q ) = 0 \partial_t q + \boldsymbol{J}(\psi, q) = 0 ∂ t q + J ( ψ , q ) = 0 where the Jacobian operator is given by:
J ( ψ , q ) = ∂ x ψ ∂ y q − ∂ y ψ ∂ x q \boldsymbol{J}(\psi, q) = \partial_x \psi \partial_y q - \partial_y \psi \partial_x q J ( ψ , q ) = ∂ x ψ ∂ y q − ∂ y ψ ∂ x q With my updated notation, this would now be:
∂ t q + det J ( ψ , q ) = 0 \partial_t q + \det\boldsymbol{J}(\psi, q) = 0 ∂ t q + det J ( ψ , q ) = 0 where the determinant Jacobian operator is given:
det J ( ψ , q ) = ∂ x ψ ∂ y q − ∂ y ψ ∂ x q \det\boldsymbol{J}(\psi, q) = \partial_x \psi \partial_y q - \partial_y \psi \partial_x q det J ( ψ , q ) = ∂ x ψ ∂ y q − ∂ y ψ ∂ x q In my eyes, this is clearer. Especially in the papers where people recycle the equations without explicitly defining the operators and their meaning.