Differential Operators ¶ Difference ¶ This is the case where we have a function f : R D → R P \boldsymbol{f}:\mathbb{R}^D \rightarrow \mathbb{R}^P f : R D → R P which maps an input vector x ⃗ \vec{\mathbf{x}} x to a scalar value. We denote this operation as
Difference : = ∂ x f = ∇ i f \begin{aligned}
\text{Difference}
&:= \partial_x \boldsymbol{f} \\
&= \nabla_i \boldsymbol{f}
\end{aligned} Difference := ∂ x f = ∇ i f So this operator is
∂ i f : R → R \partial_i \boldsymbol{f}: \mathbb{R} \rightarrow \mathbb{R} ∂ i f : R → R where i is the index of the input vector, x ⃗ \vec{\mathbf{x}} x , of the function f \boldsymbol{f} f . We can also right the functional transformation version
∂ i [ f ] ( x ⃗ ) : R D → R \partial_i [\boldsymbol{f}](\vec{\mathbf{x}}): \mathbb{R}^D \rightarrow \mathbb{R} ∂ i [ f ] ( x ) : R D → R Pseudo-Code
u: Array["P Dx"] = ...
du_dx: Array["P"] = derivative(u, step_size=dx, axis=0, order=1, accuracy=2)
du_dx: Array["P"] = derivative(u, step_size=dy, axis=1, order=1, accuracy=2)
# second partial derivative
d2u_dx2: Array["P"] = derivative(u, step_size=dx, axis=0, order=2, accuracy=2)
d2u_dy2: Array["P"] = derivative(u, step_size=dy, axis=1, order=2, accuracy=2)
Resources
Gradient ¶ The directions of the fastest change and the directional derivative. Tells me locally where something is increasing or decreasing the fastest. Tells us the rate of change at every point (a vector direction of change)
Turns a scalar field into a vector field!
Gradient : = grad ( f ) = ∇ f = [ ∂ ∂ x ∂ ∂ y ∂ ∂ z ] ⋅ f = [ ∂ f ∂ x ∂ f ∂ y ∂ f ∂ z ] = J i ( f ) \begin{aligned}
\text{Gradient}
&:=\text{grad}(\boldsymbol{f}) \\
&= \boldsymbol{\nabla} \boldsymbol{f}\\
&=
\begin{bmatrix}
\frac{\partial}{\partial x} \\
\frac{\partial}{\partial y} \\
\frac{\partial}{\partial z}
\end{bmatrix} \cdot \boldsymbol{f} \\
&=
\begin{bmatrix}
\frac{\partial f}{\partial x} \\
\frac{\partial f}{\partial y} \\
\frac{\partial f}{\partial z}
\end{bmatrix}\\
&= \boldsymbol{J}_i(\boldsymbol{f})
\end{aligned} Gradient := grad ( f ) = ∇ f = ⎣ ⎡ ∂ x ∂ ∂ y ∂ ∂ z ∂ ⎦ ⎤ ⋅ f = ⎣ ⎡ ∂ x ∂ f ∂ y ∂ f ∂ z ∂ f ⎦ ⎤ = J i ( f ) So the operation is:
grad ( f ) = ∇ f : R → R D \text{grad}(\boldsymbol{f}) = \boldsymbol{\nabla}\boldsymbol{f}: \mathbb{R} \rightarrow \mathbb{R}^D grad ( f ) = ∇ f : R → R D where D D D is the size of the input vector, x ⃗ \vec{\mathbf{x}} x . Let’s take a scalar field with vector-valued inputs.
f = f ( x , y , z ) = f ( x ⃗ ) f : R 3 → R f=\boldsymbol{f}(x,y,z)=\boldsymbol{f}(\vec{\mathbf{x}}) \hspace{10mm} f:\mathbb{R}^3\rightarrow \mathbb{R} f = f ( x , y , z ) = f ( x ) f : R 3 → R Then the gradient is
grad ( f ) = [ ∂ f ∂ x ∂ f ∂ y ∂ f ∂ z ] \text{grad}(\boldsymbol{f})
=
\begin{bmatrix}
\frac{\partial \boldsymbol{f}}{\partial x} \\
\frac{\partial \boldsymbol{f}}{\partial y} \\
\frac{\partial \boldsymbol{f}}{\partial z}
\end{bmatrix} grad ( f ) = ⎣ ⎡ ∂ x ∂ f ∂ y ∂ f ∂ z ∂ f ⎦ ⎤ We can also write the functional transformation version:
grad [ f ] ( x ⃗ ) = ∇ f : R D → R D \text{grad}[\boldsymbol{f}](\vec{\mathbf{x}}) = \boldsymbol{\nabla}\boldsymbol{f}: \mathbb{R}^D \rightarrow \mathbb{R}^D grad [ f ] ( x ) = ∇ f : R D → R D Pseudo-Code
# scalar value
x: Array[""] = ...
y: Array[""] = ...
output: Array[""] = f(x,y)
# vectorized
x: Array["N"] = ...
y: Array["N"] = ...
output: Array["N"] = vmap(f, args=(0,1))(x,y)
# meshgrid
x: Array["N"] = ...
y: Array["M"] = ...
X: Array["N M"], Y: Array["N M"] = meshgrid(x,y, indexing="ij")
x: Array["NM"] = flatten(X)
y: Array["NM"] = flatten(Y)
output: Array["NM"] = vmap(f, args=(0,1))(x,y)
Example
Let’s take the function
f ( x , y ) = x 2 + y 2 f : R × R → R f(x,y) = x^2 + y^2 \hspace{10mm} f:\mathbb{R}\times\mathbb{R}\rightarrow \mathbb{R} f ( x , y ) = x 2 + y 2 f : R × R → R Then the gradient would be
∇ f = [ 2 x 2 y ] \nabla f =
\begin{bmatrix}
2x \\
2y
\end{bmatrix} ∇ f = [ 2 x 2 y ] x , y ∈ Ω Ω x ∈ R f ( x , y ) = x 2 + y 2 f : R N × R N → R N × 2 grad [ f ] ( x , y ) = [ 2 x 2 y ] grad [ f ] : R N × 2 → R N × 2 \begin{aligned}
\mathbf{x},\mathbf{y}&\in\Omega && && \Omega_x\in\mathbb{R}\\
\boldsymbol{f}(\mathbf{x},\mathbf{y}) &= \mathbf{x}^2 + \mathbf{y}^2 && &&\boldsymbol{f}:\mathbb{R}^N\times\mathbb{R}^N\rightarrow \mathbb{R}^{N\times 2} \\
\text{grad}[\mathbf{f}](\mathbf{x},\mathbf{y}) &=
\begin{bmatrix}
2\mathbf{x} \\
2\mathbf{y}
\end{bmatrix} && && \text{grad}[\boldsymbol{f}]: \mathbb{R}^{N\times 2}\rightarrow\mathbb{R}^{N\times 2}
\end{aligned} x , y f ( x , y ) grad [ f ] ( x , y ) ∈ Ω = x 2 + y 2 = [ 2 x 2 y ] Ω x ∈ R f : R N × R N → R N × 2 grad [ f ] : R N × 2 → R N × 2 Vector Fields ¶ Psuedo-Code
# scalar value
x: Array[""] = ...
y: Array[""] = ...
output: Array[""] = f(x,y)
# vectorized
x: Array["N"] = ...
y: Array["N"] = ...
output: Array["N"] = vmap(f, args=(0,1))(x,y)
# meshgrid
x: Array["N"] = ...
y: Array["M"] = ...
X: Array["N M"], Y: Array["N M"] = meshgrid(x,y, indexing="ij")
x: Array["NM"] = flatten(X)
y: Array["NM"] = flatten(Y)
output: Array["NM"] = vmap(f, args=(0,1))(x,y)
Jacobian ¶ Resources
Mathemaniac Visualization of Jacobian - Video Serpentine Integral Visualization of the Change of Variables and the Jacobian - Video Divergence ¶ Turns a vector field into a scalar field. It measures locally how much stuff is flowing away or flowing towards a single point in space. Basically, how much the vector field is expanding outwards or into a point in space!
How we measure sources and sinks!
Let’s take a vector valued function:
f ⃗ : R D → R D \vec{\boldsymbol{f}}:\mathbb{R}^{D}\rightarrow\mathbb{R}^D f : R D → R D The divergence operator does the following transformation:
div ( f ⃗ ) : R D → R \text{div }(\vec{\boldsymbol{f}}): \mathbb{R}^D \rightarrow \mathbb{R} div ( f ) : R D → R Then the divergence operator is the following:
Divergence : = div ( f ⃗ ) = ∇ ⃗ ⋅ f ⃗ = ( ∂ ∂ x , ∂ ∂ y , ∂ ∂ z ) ⋅ ( f 1 , f 2 , f 3 ) = ( ∂ ∂ x , ∂ ∂ y , ∂ ∂ z ) ⋅ ( f i ^ + f j ^ + f k ^ ) = ∂ f 1 ∂ x + ∂ f 2 ∂ y + ∂ f 3 ∂ z \begin{aligned}
\text{Divergence }
&:= \text{div }(\vec{\boldsymbol{f}}) \\
&= \vec{\nabla}\cdot \vec{\boldsymbol{f}} \\
&= \left(\frac{\partial}{\partial x}, \frac{\partial}{\partial y}, \frac{\partial}{\partial z} \right)\cdot \left(f_1, f_2, f_3\right) \\
&= \left(\frac{\partial}{\partial x}, \frac{\partial}{\partial y}, \frac{\partial}{\partial z} \right)\cdot \left(f\hat{i} + f\hat{j} + f\hat{k}\right) \\
&= \frac{\partial f_1}{\partial x} + \frac{\partial f_2}{\partial y} + \frac{\partial f_3}{\partial z}
\end{aligned} Divergence := div ( f ) = ∇ ⋅ f = ( ∂ x ∂ , ∂ y ∂ , ∂ z ∂ ) ⋅ ( f 1 , f 2 , f 3 ) = ( ∂ x ∂ , ∂ y ∂ , ∂ z ∂ ) ⋅ ( f i ^ + f j ^ + f k ^ ) = ∂ x ∂ f 1 + ∂ y ∂ f 2 + ∂ z ∂ f 3 We can also write the functional transformation version that maps a vector input, x ⃗ \vec{\mathbf{x}} x , through the transformation f ( ⋅ ) \boldsymbol{f}(\cdot) f ( ⋅ ) to the output of the divergence operator div ( ⋅ ) \text{div}(\cdot) div ( ⋅ ) . We have the following:
div [ f ⃗ ] ( x ⃗ ) : R D → R \text{div}\left[\vec{\boldsymbol{f}}\right](\vec{\mathbf{x}}): \mathbb{R}^D \rightarrow \mathbb{R} div [ f ] ( x ) : R D → R Pseudo-Code
u: Array["P D"]
# from scratch (differences)
du_dx: Array["P"] = difference(u, step_size=dx, axis=0, order=1, accuracy=4)
du_dy: Array["P"] = difference(u, step_size=dy, axis=1, order=1, accuracy=4)
u_div: Array["P"] = du_dx + du_dy
# from scratch (gradient)
u_grad: Array["P D"] = gradient(u, step_size=(dx,dy,dz), order=1, accuracy=4)
u_div: Array["P"] = np.sum(u_grad, axis=1)
# divergence operator
u_div: Array["P"] = divergence(u, step_size=(dx,dy), accuracy=4)
Curl ¶ How we measure rotation!
Curl : = curl ( f ⃗ ) = ∇ × f ⃗ = det ∣ i ^ j ^ j ^ ∂ ∂ x ∂ ∂ y ∂ ∂ z f 1 f 2 f 3 ∣ = [ ∂ f 2 ∂ z − ∂ f 3 ∂ y ∂ f 1 ∂ z − ∂ f 3 ∂ x ∂ f 1 ∂ z − ∂ f 2 ∂ y ] = ( ∂ f 2 ∂ z − ∂ f 3 ∂ y ) i ^ ( ∂ f 1 ∂ z − ∂ f 3 ∂ x ) j ^ ( ∂ f 1 ∂ z − ∂ f 2 ∂ y ) k ^ \begin{aligned}
\text{Curl}
&:= \text{curl}(\vec{\boldsymbol{f}}) \\
&= \nabla\times \vec{\boldsymbol{f}} \\
&= \det
\begin{vmatrix}
\hat{i} & \hat{j} & \hat{j} \\
\frac{\partial}{\partial x} & \frac{\partial}{\partial y} & \frac{\partial}{\partial z} \\
f_1 & f_2 & f_3
\end{vmatrix} \\
&= \begin{bmatrix}
\frac{\partial f_2}{\partial z} - \frac{\partial f_3}{\partial y}\\
\frac{\partial f_1}{\partial z} - \frac{\partial f_3}{\partial x} \\
\frac{\partial f_1}{\partial z} - \frac{\partial f_2}{\partial y}
\end{bmatrix} \\
&= \left(\frac{\partial f_2}{\partial z} - \frac{\partial f_3}{\partial y}\right)\hat{i}
\left( \frac{\partial f_1}{\partial z} - \frac{\partial f_3}{\partial x}\right)\hat{j}
\left( \frac{\partial f_1}{\partial z} - \frac{\partial f_2}{\partial y}\right)\hat{k}
\end{aligned} Curl := curl ( f ) = ∇ × f = det ∣ ∣ i ^ ∂ x ∂ f 1 j ^ ∂ y ∂ f 2 j ^ ∂ z ∂ f 3 ∣ ∣ = ⎣ ⎡ ∂ z ∂ f 2 − ∂ y ∂ f 3 ∂ z ∂ f 1 − ∂ x ∂ f 3 ∂ z ∂ f 1 − ∂ y ∂ f 2 ⎦ ⎤ = ( ∂ z ∂ f 2 − ∂ y ∂ f 3 ) i ^ ( ∂ z ∂ f 1 − ∂ x ∂ f 3 ) j ^ ( ∂ z ∂ f 1 − ∂ y ∂ f 2 ) k ^ We can write this as
curl ( f ⃗ ) : R D → R D \text{curl}(\vec{\boldsymbol{f}}): \mathbb{R}^D \rightarrow \mathbb{R}^D curl ( f ) : R D → R D We can also write the functional transformation version
curl [ f ⃗ ] ( x ⃗ ) : R D → R D \text{curl}[\vec{\boldsymbol{f}}](\vec{\mathbf{x}}): \mathbb{R}^D \rightarrow \mathbb{R}^D curl [ f ] ( x ) : R D → R D Resources
Laplacian ¶ The second derivative
Laplacian : = Δ u = ∇ 2 u = div ( ∇ u ) = ∂ x x u + ∂ y y u + ∂ z z u \begin{aligned}
\text{Laplacian }
&:= \Delta u \\
&= \nabla^2 u \\
&= \text{div}(\nabla u) \\
&= \partial_{xx}u + \partial_{yy}u + \partial_{zz}u
\end{aligned} Laplacian := Δ u = ∇ 2 u = div ( ∇ u ) = ∂ xx u + ∂ yy u + ∂ zz u We can also write this as the functional transformation version
Laplacian [ f ] ( x ⃗ ) : R D → R \text{Laplacian}[\boldsymbol{f}](\vec{\mathbf{x}}): \mathbf{R}^D \rightarrow \mathbf{R} Laplacian [ f ] ( x ) : R D → R Pseudo-Code
u: Array["P D"] = ...
# from scratch (partial derivatives)
d2u_dx2: Array["P"] = derivative(u, step_size=dx, axis=0, order=2, accuracy=4)
d2u_dy2: Array["P"] = derivative(u, step_size=dy, axis=1, order=2, accuracy=4)
u_lap: Array["P"] = d2u_dx2 + d2u_dy2
# from scratch (divergence)
u_grad: Array["P D"] = gradient(u, step_size=(dx,dy), order=1, accuracy=4)
u_lap: Array["P"] = divergence(u, step_size=(dx,dy), accuracy=4)
# laplacian operator
u_lap: Array["P"] = laplacian(u, step_size=(dx,dy), accuracy=4)
Material Derivative ¶ Scalar Field ¶ Given a scalar field:
ϕ : = ϕ ( x ⃗ , t ) = ϕ ( x , y , z , t ) ϕ : R 3 × R → R \phi:=\boldsymbol{\phi}(\vec{\mathbf{x}},t)=\boldsymbol{\phi}(x,y,z,t) \hspace{10mm} \phi:\mathbb{R}^3\times\mathbb{R}\rightarrow \mathbb{R} ϕ := ϕ ( x , t ) = ϕ ( x , y , z , t ) ϕ : R 3 × R → R We can write the Material derivative as
D ϕ D t : = ∂ ϕ ∂ t + u ⃗ ⋅ ∇ ϕ \frac{D\phi}{Dt} := \frac{\partial \phi}{\partial t} + \vec{\mathbf{u}} \cdot \nabla \phi D t D ϕ := ∂ t ∂ ϕ + u ⋅ ∇ ϕ where
u ⃗ ⋅ ∇ ϕ = u 1 ∂ ϕ ∂ x + u 2 ∂ ϕ ∂ y + u 3 ∂ ϕ ∂ z \vec{\mathbf{u}} \cdot \nabla \phi =
u_1\frac{\partial \phi}{\partial x} +
u_2\frac{\partial \phi}{\partial y} +
u_3\frac{\partial \phi}{\partial z} u ⋅ ∇ ϕ = u 1 ∂ x ∂ ϕ + u 2 ∂ y ∂ ϕ + u 3 ∂ z ∂ ϕ Vector Field ¶ Given a vector valued field:
F ⃗ : = F ⃗ ( x ⃗ , t ) = F ⃗ ( x , y , z , t ) F ⃗ : R 3 × R → R 3 \vec{\boldsymbol{F}}:=\vec{\boldsymbol{F}}(\vec{\mathbf{x}},t)=
\vec{\boldsymbol{F}}(x,y,z,t) \hspace{10mm}
\vec{\boldsymbol{F}}:\mathbb{R}^3\times\mathbb{R}\rightarrow \mathbb{R}^{3} F := F ( x , t ) = F ( x , y , z , t ) F : R 3 × R → R 3 We can write the Material derivative as
D F ⃗ D t : = ∂ F ⃗ ∂ t + u ⃗ ⋅ ∇ F ⃗ \frac{D \vec{\boldsymbol{F}}}{Dt} := \frac{\partial \vec{\boldsymbol{F}}}{\partial t} + \vec{\mathbf{u}} \cdot \nabla \vec{\boldsymbol{F}} D t D F := ∂ t ∂ F + u ⋅ ∇ F where
u ⃗ ⋅ ∇ F ⃗ = u 1 ∂ F ⃗ ∂ x + u 2 ∂ F ⃗ ∂ y + u 3 ∂ F ⃗ ∂ z \vec{\mathbf{u}} \cdot \nabla \vec{\boldsymbol{F}} =
u_1\frac{\partial \vec{\boldsymbol{F}}}{\partial x} +
u_2\frac{\partial \vec{\boldsymbol{F}}}{\partial y} +
u_3\frac{\partial \vec{\boldsymbol{F}}}{\partial z} u ⋅ ∇ F = u 1 ∂ x ∂ F + u 2 ∂ y ∂ F + u 3 ∂ z ∂ F Resources
Determinant Jacobian ¶ From a differential operator perspective, we have
det J ( A , B ) = − det J ( B , A ) = k ⋅ ( ∇ A × ∇ B ) = − k ⋅ ∇ × ( A ∇ B ) = − k ⋅ ∇ × ( B ∇ A ) = − k curl ( B ∇ A ) \begin{aligned}
\det\boldsymbol{J}(A,B) &= -\det\boldsymbol{J}(B,A)\\
&= \mathbf{k}\cdot\left(\nabla A\times \nabla B\right) \\
&= - \mathbf{k}\cdot \nabla\times\left(A\nabla B\right) \\
&= - \mathbf{k}\cdot \nabla\times\left(B\nabla A\right) \\
&= -\mathbf{k}\text{ curl}\left(B\nabla A\right)
\end{aligned} det J ( A , B ) = − det J ( B , A ) = k ⋅ ( ∇ A × ∇ B ) = − k ⋅ ∇ × ( A ∇ B ) = − k ⋅ ∇ × ( B ∇ A ) = − k curl ( B ∇ A ) If we think of Cartesian coordinates, we have
det J ( A , B ) = ∂ A ∂ x ∂ B ∂ y − ∂ A ∂ y ∂ B ∂ x = ∂ ∂ x ( A ∂ B ∂ y ) − ∂ ∂ y ( A ∂ B ∂ x ) = ∂ ∂ y ( B ∂ A ∂ x ) − ∂ ∂ x ( B ∂ A ∂ y ) \begin{aligned}
\det \boldsymbol{J}(A,B) &= \frac{\partial A}{\partial x}\frac{\partial B}{\partial y} -\frac{\partial A}{\partial y}\frac{\partial B}{\partial x} \\
&= \frac{\partial }{\partial x}\left(A\frac{\partial B}{\partial y}\right) -\frac{\partial }{\partial y}\left(A\frac{\partial B}{\partial x}\right) \\
&= \frac{\partial }{\partial y}\left(B\frac{\partial A}{\partial x}\right) -\frac{\partial }{\partial x}\left(B\frac{\partial A}{\partial y}\right) \\
\end{aligned} det J ( A , B ) = ∂ x ∂ A ∂ y ∂ B − ∂ y ∂ A ∂ x ∂ B = ∂ x ∂ ( A ∂ y ∂ B ) − ∂ y ∂ ( A ∂ x ∂ B ) = ∂ y ∂ ( B ∂ x ∂ A ) − ∂ x ∂ ( B ∂ y ∂ A ) We can write this transformation as
det J ( f , g ) : R D × R D → R D \det\boldsymbol{J}(\boldsymbol{f}, \boldsymbol{g}): \mathbf{R}^D\times\mathbb{R}^{D} \rightarrow \mathbf{R}^{D} det J ( f , g ) : R D × R D → R D We can also write this as the functional transformation version
det J [ f , g ] ( x ⃗ ) : R D → R \det\boldsymbol{J}[\boldsymbol{f}, \boldsymbol{g}](\vec{\mathbf{x}}): \mathbf{R}^D \rightarrow \mathbf{R} det J [ f , g ] ( x ) : R D → R Pseudo-Code
u: Array["P D"] = ...
v: Array["P D"] = ...
# det Jacobian operator
step_size = ((dx,dy),(dx,dy))
u_detj: Array[""] = det_jacobian(u, v, step_size, accuracy)
# from scratch (partial derivatives)
du_dx: Array["P"] = derivative(u, step_size=dx, axis=0, order=1, accuracy=2)
du_dy: Array["P"] = derivative(u, step_size=dy, axis=1, order=1, accuracy=2)
dv_dx: Array["P"] = derivative(v, step_size=dx, axis=0, order=1, accuracy=2)
dv_dy: Array["P"] = derivative(v, step_size=dy, axis=1, order=1, accuracy=2)
u_detj: Array["P"] = du_dx * dv_dy - du_dy * dv_dx
# from scratch (partial derivatives + divergence)
du_dx: Array["P"] = derivative(u, step_size=dx, axis=0, order=1, accuracy=2)
du_dy: Array["P"] = derivative(u, step_size=dy, axis=1, order=1, accuracy=2)
vdu_dx: Array["P D"] = v * du_dx
vdu_dy: Array["P D"] = v * du_dy
# from scratch (gradient + divergence)
u_grad: Array["P D"] = gradient(u, step_size=(dx,dy), order=1, accuracy=2)
vu_grad: Array["P D"] = v * u_grad
u_detj: Array["P"] = curl(vu_grad, step_size=(dx,dy), accuracy=2)
Resources
Explanation of Jacobian from Flows Perspective - Slides Serpentine Integral Visualization of the Change of Variables and the Jacobian - Video Khan Academy Explanation of Determinant - Video 3Blue3Brown Visualization of Determinant Jacobian - Video Helmholtz Equation ¶ ∇ f ( x ⃗ ) − k 2 f ( x ⃗ ) = 0 ( ∇ − k 2 ) f ( x ⃗ ) = 0 \begin{aligned}
\nabla \boldsymbol{f}(\vec{\mathbf{x}}) - k^2 \boldsymbol{f}(\vec{\mathbf{x}}) &= 0 \\
\left( \nabla - k^2 \right)\boldsymbol{f}(\vec{\mathbf{x}}) &= 0
\end{aligned} ∇ f ( x ) − k 2 f ( x ) ( ∇ − k 2 ) f ( x ) = 0 = 0 Helmholtz Decomposition ¶ f ⃗ = − ∇ ϕ ⏟ Div-Free + ∇ × A ⏟ Curl-Free \vec{\boldsymbol{f}} = \underbrace{- \nabla\phi}_{\text{Div-Free}}+ \underbrace{\nabla\times \mathbf{A}}_{\text{Curl-Free}} f = Div-Free − ∇ ϕ + Curl-Free ∇ × A Vector Jacobian Products ¶ Tangent Space + Primal Space -> Jacobian-Vector Product Primal Space + Cotangent Space -> Vector-Jacobian Product Resources :
jax documentation Linearization is All You Need for an AutoDiff Library - Blog The Adjoint Method in a Dozen Lines of JAX - Blog Adjoint Sensitivities over nonlinear equations with JAX - YouTube Using JAX Jacobians for Adjoint Sensitivities over Nonlinear Systems of Equations - YouTube A Tutorial on Automatic Differentiation for Scientific Design - Slides Linearization ¶ First, we linearize about the prior estimate, μ z \boldsymbol{\mu_z} μ z .
f ( z ) ≈ f ( μ z ) + ∇ z f ( z − μ z ) + O 2 ( f ) \boldsymbol{f}(\boldsymbol{z}) \approx
\boldsymbol{f}(\boldsymbol{\mu_z}) +
\boldsymbol{\nabla_z}\boldsymbol{f}(\boldsymbol{z}-\boldsymbol{\mu_z}) + \mathcal{O}^2(\boldsymbol{f}) f ( z ) ≈ f ( μ z ) + ∇ z f ( z − μ z ) + O 2 ( f ) Then, we approximate the gradient of the function with the gradient evaluated at the prior estimate
f ( z ) ≈ J f ⊤ ( z ) : = ∇ z f ( z ) ∣ z = μ z \boldsymbol{f}(\boldsymbol{z}) \approx \boldsymbol{J_f}^\top (\boldsymbol{z}) :=
\boldsymbol{\nabla_z}\boldsymbol{f}(\boldsymbol{z})|_{\boldsymbol{z}=\boldsymbol{\mu_z}} f ( z ) ≈ J f ⊤ ( z ) := ∇ z f ( z ) ∣ z = μ z Here, the operator, J f ( z ) \boldsymbol{J_f}(\boldsymbol{z}) J f ( z ) , is the tangent-linear operator of the function f ( ⋅ ) \boldsymbol{f}(\cdot) f ( ⋅ ) evaluated at μ z \boldsymbol{\mu_z} μ z , and the J f ⊤ ( z ) \boldsymbol{J_f}^\top(\boldsymbol{z}) J f ⊤ ( z ) is the adjoint.
Dynamical Tangent-Linear Operator : J f ( z ) = F z = ∇ z f ( z ) ∣ z = μ z , R D → R D x × D x Dynamical Adjoint Operator : J z ⊤ ( z ) = F z ⊤ = ∇ z f ( z ) ∣ z = μ z , R D → R D x × D x Observation Tangent-Linear Operator : J h ( z ) = H z = ∇ z h ( z ) ∣ z = μ z , R D → R D y × D x Observation Adjoint Operator : J h ⊤ ( z ) = H z ⊤ = ∇ z h ( z ) ∣ z = μ z , R D y → R D x × D y \begin{aligned}
\text{Dynamical Tangent-Linear Operator}: && &&
\boldsymbol{J_f}(\boldsymbol{z}) &=
\boldsymbol{F_z} =
\boldsymbol{\nabla_z}\boldsymbol{f}(\boldsymbol{z})|_{\boldsymbol{z}=\boldsymbol{\mu_z}},
&& &&
\mathbb{R}^{D}\rightarrow\mathbb{R}^{D_x\times D_x}\\
\text{Dynamical Adjoint Operator}: && &&
\boldsymbol{J_z}^\top(\boldsymbol{z}) &=
\boldsymbol{F_z}^\top =
\boldsymbol{\nabla_z}\boldsymbol{f}(\boldsymbol{z})|_{\boldsymbol{z}=\boldsymbol{\mu_z}},
&& &&
\mathbb{R}^{D}\rightarrow\mathbb{R}^{D_x\times D_x}\\
\text{Observation Tangent-Linear Operator}: && &&
\boldsymbol{J_h}(\boldsymbol{z}) &=
\boldsymbol{H_z} =
\boldsymbol{\nabla_z}\boldsymbol{h}(\boldsymbol{z})|_{\boldsymbol{z}=\boldsymbol{\mu_z}},
&& &&
\mathbb{R}^{D}\rightarrow\mathbb{R}^{D_y\times D_x}\\
\text{Observation Adjoint Operator}: && &&
\boldsymbol{J_h}^\top(\boldsymbol{z}) &=
\boldsymbol{H_z}^\top =
\boldsymbol{\nabla_z}\boldsymbol{h}(\boldsymbol{z})|_{\boldsymbol{z}=\boldsymbol{\mu_z}},
&& &&
\mathbb{R}^{D_y}\rightarrow\mathbb{R}^{D_x\times D_y}\\
\end{aligned} Dynamical Tangent-Linear Operator : Dynamical Adjoint Operator : Observation Tangent-Linear Operator : Observation Adjoint Operator : J f ( z ) J z ⊤ ( z ) J h ( z ) J h ⊤ ( z ) = F z = ∇ z f ( z ) ∣ z = μ z , = F z ⊤ = ∇ z f ( z ) ∣ z = μ z , = H z = ∇ z h ( z ) ∣ z = μ z , = H z ⊤ = ∇ z h ( z ) ∣ z = μ z , R D → R D x × D x R D → R D x × D x R D → R D y × D x R D y → R D x × D y Tangent-Linear Model ¶ Tangent-Linear Model : ( u , v ) → ∂ z f ( z ) v Input Vector : u : ∈ R D Tangent Vector : v : ∈ R D Jacobian-Vector Product : jvp : R D × R M → R D \begin{aligned}
\text{Tangent-Linear Model}: && &&
(\boldsymbol{u,v}) &\rightarrow
\partial_{\boldsymbol{z}}\boldsymbol{f}(\boldsymbol{z})\boldsymbol{v} \\
\text{Input Vector}: && &&
\boldsymbol{u}: &\in \mathbb{R}^{D} \\
\text{Tangent Vector}: && &&
\boldsymbol{v}: &\in \mathbb{R}^{D} \\
\text{Jacobian-Vector Product}: && &&
\text{jvp} &: \mathbb{R}^{D}\times\mathbb{R}^{M}\rightarrow\mathbb{R}^{D}
\end{aligned} Tangent-Linear Model : Input Vector : Tangent Vector : Jacobian-Vector Product : ( u , v ) u : v : jvp → ∂ z f ( z ) v ∈ R D ∈ R D : R D × R M → R D Adjoint Model ¶ Adjoint Model : ( u , v ) → v ∂ z f ( z ) ⊤ Input Vector : u : ∈ R D Tangent Vector : v : ∈ R M Jacobian-Vector Product : jvp : R D × R M → R M \begin{aligned}
\text{Adjoint Model}: && &&
(\boldsymbol{u,v}) &\rightarrow
\boldsymbol{v}\partial_{\boldsymbol{z}}\boldsymbol{f}(\boldsymbol{z})^\top \\
\text{Input Vector}: && &&
\boldsymbol{u}: &\in \mathbb{R}^{D} \\
\text{Tangent Vector}: && &&
\boldsymbol{v}: &\in \mathbb{R}^{M} \\
\text{Jacobian-Vector Product}: && &&
\text{jvp} &: \mathbb{R}^{D}\times\mathbb{R}^{M}\rightarrow\mathbb{R}^{M}
\end{aligned} Adjoint Model : Input Vector : Tangent Vector : Jacobian-Vector Product : ( u , v ) u : v : jvp → v ∂ z f ( z ) ⊤ ∈ R D ∈ R M : R D × R M → R M