Notation - Differential Operators
Differential Operators ¶ Difference ¶ This is the case where we have a function f : R D → R P \boldsymbol{f}:\mathbb{R}^D \rightarrow \mathbb{R}^P f : R D → R P which maps an input vector x ⃗ \vec{\mathbf{x}} x to a scalar value. We denote this operation as
Difference : = ∂ x f = ∇ i f \begin{aligned}
\text{Difference}
&:= \partial_x \boldsymbol{f} \\
&= \nabla_i \boldsymbol{f}
\end{aligned} Difference := ∂ x f = ∇ i f So this operator is
∂ i f : R → R \partial_i \boldsymbol{f}: \mathbb{R} \rightarrow \mathbb{R} ∂ i f : R → R where i _i i is the index of the input vector, x ⃗ \vec{\mathbf{x}} x , of the function f \boldsymbol{f} f . We can also right the functional transformation version
∂ i [ f ] ( x ⃗ ) : R D → R \partial_i [\boldsymbol{f}](\vec{\mathbf{x}}): \mathbb{R}^D \rightarrow \mathbb{R} ∂ i [ f ] ( x ) : R D → R
u: Array["P Dx"] = ...
du_dx: Array["P"] = derivative(u, step_size=dx, axis=0, order=1, accuracy=2)
du_dx: Array["P"] = derivative(u, step_size=dy, axis=1, order=1, accuracy=2)
# second partial derivative
d2u_dx2: Array["P"] = derivative(u, step_size=dx, axis=0, order=2, accuracy=2)
d2u_dy2: Array["P"] = derivative(u, step_size=dy, axis=1, order=2, accuracy=2)
Gradient ¶ The directions of the fastest change and the directional derivative. Tells me locally where something is increasing or decreasing the fastest. Tells us the rate of change at every point (a vector direction of change)
Turns a scalar field into a vector field!
Gradient : = grad ( f ) = ∇ f = [ ∂ ∂ x ∂ ∂ y ∂ ∂ z ] ⋅ f = [ ∂ f ∂ x ∂ f ∂ y ∂ f ∂ z ] = J i ( f ) \begin{aligned}
\text{Gradient}
&:=\text{grad}(\boldsymbol{f}) \\
&= \boldsymbol{\nabla} \boldsymbol{f}\\
&=
\begin{bmatrix}
\frac{\partial}{\partial x} \\
\frac{\partial}{\partial y} \\
\frac{\partial}{\partial z}
\end{bmatrix} \cdot \boldsymbol{f} \\
&=
\begin{bmatrix}
\frac{\partial f}{\partial x} \\
\frac{\partial f}{\partial y} \\
\frac{\partial f}{\partial z}
\end{bmatrix}\\
&= \boldsymbol{J}_i(\boldsymbol{f})
\end{aligned} Gradient := grad ( f ) = ∇ f = ⎣ ⎡ ∂ x ∂ ∂ y ∂ ∂ z ∂ ⎦ ⎤ ⋅ f = ⎣ ⎡ ∂ x ∂ f ∂ y ∂ f ∂ z ∂ f ⎦ ⎤ = J i ( f ) So the operation is:
grad ( f ) = ∇ f : R → R D \text{grad}(\boldsymbol{f}) = \boldsymbol{\nabla}\boldsymbol{f}: \mathbb{R} \rightarrow \mathbb{R}^D grad ( f ) = ∇ f : R → R D where D D D is the size of the input vector, x ⃗ \vec{\mathbf{x}} x . Let’s take a scalar field with vector-valued inputs.
f = f ( x , y , z ) = f ( x ⃗ ) f : R 3 → R f=\boldsymbol{f}(x,y,z)=\boldsymbol{f}(\vec{\mathbf{x}}) \hspace{10mm} f:\mathbb{R}^3\rightarrow \mathbb{R} f = f ( x , y , z ) = f ( x ) f : R 3 → R Then the gradient is
grad ( f ) = [ ∂ f ∂ x ∂ f ∂ y ∂ f ∂ z ] \text{grad}(\boldsymbol{f})
=
\begin{bmatrix}
\frac{\partial \boldsymbol{f}}{\partial x} \\
\frac{\partial \boldsymbol{f}}{\partial y} \\
\frac{\partial \boldsymbol{f}}{\partial z}
\end{bmatrix} grad ( f ) = ⎣ ⎡ ∂ x ∂ f ∂ y ∂ f ∂ z ∂ f ⎦ ⎤ We can also write the functional transformation version:
grad [ f ] ( x ⃗ ) = ∇ f : R D → R D \text{grad}[\boldsymbol{f}](\vec{\mathbf{x}}) = \boldsymbol{\nabla}\boldsymbol{f}: \mathbb{R}^D \rightarrow \mathbb{R}^D grad [ f ] ( x ) = ∇ f : R D → R D
# scalar value
x: Array[""] = ...
y: Array[""] = ...
output: Array[""] = f(x,y)
# vectorized
x: Array["N"] = ...
y: Array["N"] = ...
output: Array["N"] = vmap(f, args=(0,1))(x,y)
# meshgrid
x: Array["N"] = ...
y: Array["M"] = ...
X: Array["N M"], Y: Array["N M"] = meshgrid(x,y, indexing="ij")
x: Array["NM"] = flatten(X)
y: Array["NM"] = flatten(Y)
output: Array["NM"] = vmap(f, args=(0,1))(x,y)
Let’s take the function
f ( x , y ) = x 2 + y 2 f : R × R → R f(x,y) = x^2 + y^2 \hspace{10mm} f:\mathbb{R}\times\mathbb{R}\rightarrow \mathbb{R} f ( x , y ) = x 2 + y 2 f : R × R → R Then the gradient would be
∇ f = [ 2 x 2 y ] \nabla f =
\begin{bmatrix}
2x \\
2y
\end{bmatrix} ∇ f = [ 2 x 2 y ] x , y ∈ Ω Ω x ∈ R f ( x , y ) = x 2 + y 2 f : R N × R N → R N × 2 grad [ f ] ( x , y ) = [ 2 x 2 y ] grad [ f ] : R N × 2 → R N × 2 \begin{aligned}
\mathbf{x},\mathbf{y}&\in\Omega && && \Omega_x\in\mathbb{R}\\
\boldsymbol{f}(\mathbf{x},\mathbf{y}) &= \mathbf{x}^2 + \mathbf{y}^2 && &&\boldsymbol{f}:\mathbb{R}^N\times\mathbb{R}^N\rightarrow \mathbb{R}^{N\times 2} \\
\text{grad}[\mathbf{f}](\mathbf{x},\mathbf{y}) &=
\begin{bmatrix}
2\mathbf{x} \\
2\mathbf{y}
\end{bmatrix} && && \text{grad}[\boldsymbol{f}]: \mathbb{R}^{N\times 2}\rightarrow\mathbb{R}^{N\times 2}
\end{aligned} x , y f ( x , y ) grad [ f ] ( x , y ) ∈ Ω = x 2 + y 2 = [ 2 x 2 y ] Ω x ∈ R f : R N × R N → R N × 2 grad [ f ] : R N × 2 → R N × 2 Vector Fields ¶
# scalar value
x: Array[""] = ...
y: Array[""] = ...
output: Array[""] = f(x,y)
# vectorized
x: Array["N"] = ...
y: Array["N"] = ...
output: Array["N"] = vmap(f, args=(0,1))(x,y)
# meshgrid
x: Array["N"] = ...
y: Array["M"] = ...
X: Array["N M"], Y: Array["N M"] = meshgrid(x,y, indexing="ij")
x: Array["NM"] = flatten(X)
y: Array["NM"] = flatten(Y)
output: Array["NM"] = vmap(f, args=(0,1))(x,y)
Jacobian ¶
Divergence ¶ Turns a vector field into a scalar field. It measures locally how much stuff is flowing away or flowing towards a single point in space. Basically, how much the vector field is expanding outwards or into a point in space!
How we measure sources and sinks!
Let’s take a vector valued function:
f ⃗ : R D → R D \vec{\boldsymbol{f}}:\mathbb{R}^{D}\rightarrow\mathbb{R}^D f : R D → R D The divergence operator does the following transformation:
div ( f ⃗ ) : R D → R \text{div }(\vec{\boldsymbol{f}}): \mathbb{R}^D \rightarrow \mathbb{R} div ( f ) : R D → R Then the divergence operator is the following:
Divergence : = div ( f ⃗ ) = ∇ ⃗ ⋅ f ⃗ = ( ∂ ∂ x , ∂ ∂ y , ∂ ∂ z ) ⋅ ( f 1 , f 2 , f 3 ) = ( ∂ ∂ x , ∂ ∂ y , ∂ ∂ z ) ⋅ ( f i ^ + f j ^ + f k ^ ) = ∂ f 1 ∂ x + ∂ f 2 ∂ y + ∂ f 3 ∂ z \begin{aligned}
\text{Divergence }
&:= \text{div }(\vec{\boldsymbol{f}}) \\
&= \vec{\nabla}\cdot \vec{\boldsymbol{f}} \\
&= \left(\frac{\partial}{\partial x}, \frac{\partial}{\partial y}, \frac{\partial}{\partial z} \right)\cdot \left(f_1, f_2, f_3\right) \\
&= \left(\frac{\partial}{\partial x}, \frac{\partial}{\partial y}, \frac{\partial}{\partial z} \right)\cdot \left(f\hat{i} + f\hat{j} + f\hat{k}\right) \\
&= \frac{\partial f_1}{\partial x} + \frac{\partial f_2}{\partial y} + \frac{\partial f_3}{\partial z}
\end{aligned} Divergence := div ( f ) = ∇ ⋅ f = ( ∂ x ∂ , ∂ y ∂ , ∂ z ∂ ) ⋅ ( f 1 , f 2 , f 3 ) = ( ∂ x ∂ , ∂ y ∂ , ∂ z ∂ ) ⋅ ( f i ^ + f j ^ + f k ^ ) = ∂ x ∂ f 1 + ∂ y ∂ f 2 + ∂ z ∂ f 3 We can also write the functional transformation version that maps a vector input, x ⃗ \vec{\mathbf{x}} x , through the transformation f ( ⋅ ) \boldsymbol{f}(\cdot) f ( ⋅ ) to the output of the divergence operator div ( ⋅ ) \text{div}(\cdot) div ( ⋅ ) . We have the following:
div [ f ⃗ ] ( x ⃗ ) : R D → R \text{div}\left[\vec{\boldsymbol{f}}\right](\vec{\mathbf{x}}): \mathbb{R}^D \rightarrow \mathbb{R} div [ f ] ( x ) : R D → R
u: Array["P D"]
# from scratch (differences)
du_dx: Array["P"] = difference(u, step_size=dx, axis=0, order=1, accuracy=4)
du_dy: Array["P"] = difference(u, step_size=dy, axis=1, order=1, accuracy=4)
u_div: Array["P"] = du_dx + du_dy
# from scratch (gradient)
u_grad: Array["P D"] = gradient(u, step_size=(dx,dy,dz), order=1, accuracy=4)
u_div: Array["P"] = np.sum(u_grad, axis=1)
# divergence operator
u_div: Array["P"] = divergence(u, step_size=(dx,dy), accuracy=4)
Curl ¶ How we measure rotation!
Curl : = curl ( f ⃗ ) = ∇ × f ⃗ = det ∣ i ^ j ^ j ^ ∂ ∂ x ∂ ∂ y ∂ ∂ z f 1 f 2 f 3 ∣ = [ ∂ f 2 ∂ z − ∂ f 3 ∂ y ∂ f 1 ∂ z − ∂ f 3 ∂ x ∂ f 1 ∂ z − ∂ f 2 ∂ y ] = ( ∂ f 2 ∂ z − ∂ f 3 ∂ y ) i ^ ( ∂ f 1 ∂ z − ∂ f 3 ∂ x ) j ^ ( ∂ f 1 ∂ z − ∂ f 2 ∂ y ) k ^ \begin{aligned}
\text{Curl}
&:= \text{curl}(\vec{\boldsymbol{f}}) \\
&= \nabla\times \vec{\boldsymbol{f}} \\
&= \det
\begin{vmatrix}
\hat{i} & \hat{j} & \hat{j} \\
\frac{\partial}{\partial x} & \frac{\partial}{\partial y} & \frac{\partial}{\partial z} \\
f_1 & f_2 & f_3
\end{vmatrix} \\
&= \begin{bmatrix}
\frac{\partial f_2}{\partial z} - \frac{\partial f_3}{\partial y}\\
\frac{\partial f_1}{\partial z} - \frac{\partial f_3}{\partial x} \\
\frac{\partial f_1}{\partial z} - \frac{\partial f_2}{\partial y}
\end{bmatrix} \\
&= \left(\frac{\partial f_2}{\partial z} - \frac{\partial f_3}{\partial y}\right)\hat{i}
\left( \frac{\partial f_1}{\partial z} - \frac{\partial f_3}{\partial x}\right)\hat{j}
\left( \frac{\partial f_1}{\partial z} - \frac{\partial f_2}{\partial y}\right)\hat{k}
\end{aligned} Curl := curl ( f ) = ∇ × f = det ∣ ∣ i ^ ∂ x ∂ f 1 j ^ ∂ y ∂ f 2 j ^ ∂ z ∂ f 3 ∣ ∣ = ⎣ ⎡ ∂ z ∂ f 2 − ∂ y ∂ f 3 ∂ z ∂ f 1 − ∂ x ∂ f 3 ∂ z ∂ f 1 − ∂ y ∂ f 2 ⎦ ⎤ = ( ∂ z ∂ f 2 − ∂ y ∂ f 3 ) i ^ ( ∂ z ∂ f 1 − ∂ x ∂ f 3 ) j ^ ( ∂ z ∂ f 1 − ∂ y ∂ f 2 ) k ^ We can write this as
curl ( f ⃗ ) : R D → R D \text{curl}(\vec{\boldsymbol{f}}): \mathbb{R}^D \rightarrow \mathbb{R}^D curl ( f ) : R D → R D We can also write the functional transformation version
curl [ f ⃗ ] ( x ⃗ ) : R D → R D \text{curl}[\vec{\boldsymbol{f}}](\vec{\mathbf{x}}): \mathbb{R}^D \rightarrow \mathbb{R}^D curl [ f ] ( x ) : R D → R D
Laplacian ¶ The second derivative
Laplacian : = Δ u = ∇ 2 u = div ( ∇ u ) = ∂ x x u + ∂ y y u + ∂ z z u \begin{aligned}
\text{Laplacian }
&:= \Delta u \\
&= \nabla^2 u \\
&= \text{div}(\nabla u) \\
&= \partial_{xx}u + \partial_{yy}u + \partial_{zz}u
\end{aligned} Laplacian := Δ u = ∇ 2 u = div ( ∇ u ) = ∂ xx u + ∂ yy u + ∂ zz u We can also write this as the functional transformation version
Laplacian [ f ] ( x ⃗ ) : R D → R \text{Laplacian}[\boldsymbol{f}](\vec{\mathbf{x}}): \mathbf{R}^D \rightarrow \mathbf{R} Laplacian [ f ] ( x ) : R D → R
u: Array["P D"] = ...
# from scratch (partial derivatives)
d2u_dx2: Array["P"] = derivative(u, step_size=dx, axis=0, order=2, accuracy=4)
d2u_dy2: Array["P"] = derivative(u, step_size=dy, axis=1, order=2, accuracy=4)
u_lap: Array["P"] = d2u_dx2 + d2u_dy2
# from scratch (divergence)
u_grad: Array["P D"] = gradient(u, step_size=(dx,dy), order=1, accuracy=4)
u_lap: Array["P"] = divergence(u, step_size=(dx,dy), accuracy=4)
# laplacian operator
u_lap: Array["P"] = laplacian(u, step_size=(dx,dy), accuracy=4)
Material Derivative ¶ Scalar Field ¶ Given a scalar field:
ϕ : = ϕ ( x ⃗ , t ) = ϕ ( x , y , z , t ) ϕ : R 3 × R → R \phi:=\boldsymbol{\phi}(\vec{\mathbf{x}},t)=\boldsymbol{\phi}(x,y,z,t) \hspace{10mm} \phi:\mathbb{R}^3\times\mathbb{R}\rightarrow \mathbb{R} ϕ := ϕ ( x , t ) = ϕ ( x , y , z , t ) ϕ : R 3 × R → R We can write the Material derivative as
D ϕ D t : = ∂ ϕ ∂ t + u ⃗ ⋅ ∇ ϕ \frac{D\phi}{Dt} := \frac{\partial \phi}{\partial t} + \vec{\mathbf{u}} \cdot \nabla \phi D t D ϕ := ∂ t ∂ ϕ + u ⋅ ∇ ϕ where
u ⃗ ⋅ ∇ ϕ = u 1 ∂ ϕ ∂ x + u 2 ∂ ϕ ∂ y + u 3 ∂ ϕ ∂ z \vec{\mathbf{u}} \cdot \nabla \phi =
u_1\frac{\partial \phi}{\partial x} +
u_2\frac{\partial \phi}{\partial y} +
u_3\frac{\partial \phi}{\partial z} u ⋅ ∇ ϕ = u 1 ∂ x ∂ ϕ + u 2 ∂ y ∂ ϕ + u 3 ∂ z ∂ ϕ Vector Field ¶ Given a vector valued field:
F ⃗ : = F ⃗ ( x ⃗ , t ) = F ⃗ ( x , y , z , t ) F ⃗ : R 3 × R → R 3 \vec{\boldsymbol{F}}:=\vec{\boldsymbol{F}}(\vec{\mathbf{x}},t)=
\vec{\boldsymbol{F}}(x,y,z,t) \hspace{10mm}
\vec{\boldsymbol{F}}:\mathbb{R}^3\times\mathbb{R}\rightarrow \mathbb{R}^{3} F := F ( x , t ) = F ( x , y , z , t ) F : R 3 × R → R 3 We can write the Material derivative as
D F ⃗ D t : = ∂ F ⃗ ∂ t + u ⃗ ⋅ ∇ F ⃗ \frac{D \vec{\boldsymbol{F}}}{Dt} := \frac{\partial \vec{\boldsymbol{F}}}{\partial t} + \vec{\mathbf{u}} \cdot \nabla \vec{\boldsymbol{F}} D t D F := ∂ t ∂ F + u ⋅ ∇ F where
u ⃗ ⋅ ∇ F ⃗ = u 1 ∂ F ⃗ ∂ x + u 2 ∂ F ⃗ ∂ y + u 3 ∂ F ⃗ ∂ z \vec{\mathbf{u}} \cdot \nabla \vec{\boldsymbol{F}} =
u_1\frac{\partial \vec{\boldsymbol{F}}}{\partial x} +
u_2\frac{\partial \vec{\boldsymbol{F}}}{\partial y} +
u_3\frac{\partial \vec{\boldsymbol{F}}}{\partial z} u ⋅ ∇ F = u 1 ∂ x ∂ F + u 2 ∂ y ∂ F + u 3 ∂ z ∂ F
Determinant Jacobian ¶ From a differential operator perspective, we have
det J ( A , B ) = − det J ( B , A ) = k ⋅ ( ∇ A × ∇ B ) = − k ⋅ ∇ × ( A ∇ B ) = − k ⋅ ∇ × ( B ∇ A ) = − k curl ( B ∇ A ) \begin{aligned}
\det\boldsymbol{J}(A,B) &= -\det\boldsymbol{J}(B,A)\\
&= \mathbf{k}\cdot\left(\nabla A\times \nabla B\right) \\
&= - \mathbf{k}\cdot \nabla\times\left(A\nabla B\right) \\
&= - \mathbf{k}\cdot \nabla\times\left(B\nabla A\right) \\
&= -\mathbf{k}\text{ curl}\left(B\nabla A\right)
\end{aligned} det J ( A , B ) = − det J ( B , A ) = k ⋅ ( ∇ A × ∇ B ) = − k ⋅ ∇ × ( A ∇ B ) = − k ⋅ ∇ × ( B ∇ A ) = − k curl ( B ∇ A ) If we think of Cartesian coordinates, we have
det J ( A , B ) = ∂ A ∂ x ∂ B ∂ y − ∂ A ∂ y ∂ B ∂ x = ∂ ∂ x ( A ∂ B ∂ y ) − ∂ ∂ y ( A ∂ B ∂ x ) = ∂ ∂ y ( B ∂ A ∂ x ) − ∂ ∂ x ( B ∂ A ∂ y ) \begin{aligned}
\det \boldsymbol{J}(A,B) &= \frac{\partial A}{\partial x}\frac{\partial B}{\partial y} -\frac{\partial A}{\partial y}\frac{\partial B}{\partial x} \\
&= \frac{\partial }{\partial x}\left(A\frac{\partial B}{\partial y}\right) -\frac{\partial }{\partial y}\left(A\frac{\partial B}{\partial x}\right) \\
&= \frac{\partial }{\partial y}\left(B\frac{\partial A}{\partial x}\right) -\frac{\partial }{\partial x}\left(B\frac{\partial A}{\partial y}\right) \\
\end{aligned} det J ( A , B ) = ∂ x ∂ A ∂ y ∂ B − ∂ y ∂ A ∂ x ∂ B = ∂ x ∂ ( A ∂ y ∂ B ) − ∂ y ∂ ( A ∂ x ∂ B ) = ∂ y ∂ ( B ∂ x ∂ A ) − ∂ x ∂ ( B ∂ y ∂ A ) We can write this transformation as
det J ( f , g ) : R D × R D → R D \det\boldsymbol{J}(\boldsymbol{f}, \boldsymbol{g}): \mathbf{R}^D\times\mathbb{R}^{D} \rightarrow \mathbf{R}^{D} det J ( f , g ) : R D × R D → R D We can also write this as the functional transformation version
det J [ f , g ] ( x ⃗ ) : R D → R \det\boldsymbol{J}[\boldsymbol{f}, \boldsymbol{g}](\vec{\mathbf{x}}): \mathbf{R}^D \rightarrow \mathbf{R} det J [ f , g ] ( x ) : R D → R
u: Array["P D"] = ...
v: Array["P D"] = ...
# det Jacobian operator
step_size = ((dx,dy),(dx,dy))
u_detj: Array[""] = det_jacobian(u, v, step_size, accuracy)
# from scratch (partial derivatives)
du_dx: Array["P"] = derivative(u, step_size=dx, axis=0, order=1, accuracy=2)
du_dy: Array["P"] = derivative(u, step_size=dy, axis=1, order=1, accuracy=2)
dv_dx: Array["P"] = derivative(v, step_size=dx, axis=0, order=1, accuracy=2)
dv_dy: Array["P"] = derivative(v, step_size=dy, axis=1, order=1, accuracy=2)
u_detj: Array["P"] = du_dx * dv_dy - du_dy * dv_dx
# from scratch (partial derivatives + divergence)
du_dx: Array["P"] = derivative(u, step_size=dx, axis=0, order=1, accuracy=2)
du_dy: Array["P"] = derivative(u, step_size=dy, axis=1, order=1, accuracy=2)
vdu_dx: Array["P D"] = v * du_dx
vdu_dy: Array["P D"] = v * du_dy
# from scratch (gradient + divergence)
u_grad: Array["P D"] = gradient(u, step_size=(dx,dy), order=1, accuracy=2)
vu_grad: Array["P D"] = v * u_grad
u_detj: Array["P"] = curl(vu_grad, step_size=(dx,dy), accuracy=2)
Explanation of Jacobian from Flows Perspective - Slides
Serpentine Integral Visualization of the Change of Variables and the Jacobian - Video
Khan Academy Explanation of Determinant - Video
3Blue3Brown Visualization of Determinant Jacobian - Video
Helmholtz Equation ¶ ∇ f ( x ⃗ ) − k 2 f ( x ⃗ ) = 0 ( ∇ − k 2 ) f ( x ⃗ ) = 0 \begin{aligned}
\nabla \boldsymbol{f}(\vec{\mathbf{x}}) - k^2 \boldsymbol{f}(\vec{\mathbf{x}}) &= 0 \\
\left( \nabla - k^2 \right)\boldsymbol{f}(\vec{\mathbf{x}}) &= 0
\end{aligned} ∇ f ( x ) − k 2 f ( x ) ( ∇ − k 2 ) f ( x ) = 0 = 0 Helmholtz Decomposition ¶ f ⃗ = − ∇ ϕ ⏟ Div-Free + ∇ × A ⏟ Curl-Free \vec{\boldsymbol{f}} = \underbrace{- \nabla\phi}_{\text{Div-Free}}+ \underbrace{\nabla\times \mathbf{A}}_{\text{Curl-Free}} f = Div-Free − ∇ ϕ + Curl-Free ∇ × A Vector Jacobian Products ¶ Resources :
jax documentation
Linearization is All You Need for an AutoDiff Library - Blog
The Adjoint Method in a Dozen Lines of JAX - Blog
Adjoint Sensitivities over nonlinear equations with JAX - YouTube
Using JAX Jacobians for Adjoint Sensitivities over Nonlinear Systems of Equations - YouTube
A Tutorial on Automatic Differentiation for Scientific Design - Slides
Linearization ¶ First, we linearize about the prior estimate, μ z \boldsymbol{\mu_z} μ z .
f ( z ) ≈ f ( μ z ) + ∇ z f ( z − μ z ) + O 2 ( f ) \boldsymbol{f}(\boldsymbol{z}) \approx
\boldsymbol{f}(\boldsymbol{\mu_z}) +
\boldsymbol{\nabla_z}\boldsymbol{f}(\boldsymbol{z}-\boldsymbol{\mu_z}) + \mathcal{O}^2(\boldsymbol{f}) f ( z ) ≈ f ( μ z ) + ∇ z f ( z − μ z ) + O 2 ( f ) Then, we approximate the gradient of the function with the gradient evaluated at the prior estimate
f ( z ) ≈ J f ⊤ ( z ) : = ∇ z f ( z ) ∣ z = μ z \boldsymbol{f}(\boldsymbol{z}) \approx \boldsymbol{J_f}^\top (\boldsymbol{z}) :=
\boldsymbol{\nabla_z}\boldsymbol{f}(\boldsymbol{z})|_{\boldsymbol{z}=\boldsymbol{\mu_z}} f ( z ) ≈ J f ⊤ ( z ) := ∇ z f ( z ) ∣ z = μ z Here, the operator, J f ( z ) \boldsymbol{J_f}(\boldsymbol{z}) J f ( z ) , is the tangent-linear operator of the function f ( ⋅ ) \boldsymbol{f}(\cdot) f ( ⋅ ) evaluated at μ z \boldsymbol{\mu_z} μ z , and the J f ⊤ ( z ) \boldsymbol{J_f}^\top(\boldsymbol{z}) J f ⊤ ( z ) is the adjoint.
Dynamical Tangent-Linear Operator : J f ( z ) = F z = ∇ z f ( z ) ∣ z = μ z , R D → R D x × D x Dynamical Adjoint Operator : J z ⊤ ( z ) = F z ⊤ = ∇ z f ( z ) ∣ z = μ z , R D → R D x × D x Observation Tangent-Linear Operator : J h ( z ) = H z = ∇ z h ( z ) ∣ z = μ z , R D → R D y × D x Observation Adjoint Operator : J h ⊤ ( z ) = H z ⊤ = ∇ z h ( z ) ∣ z = μ z , R D y → R D x × D y \begin{aligned}
\text{Dynamical Tangent-Linear Operator}: && &&
\boldsymbol{J_f}(\boldsymbol{z}) &=
\boldsymbol{F_z} =
\boldsymbol{\nabla_z}\boldsymbol{f}(\boldsymbol{z})|_{\boldsymbol{z}=\boldsymbol{\mu_z}},
&& &&
\mathbb{R}^{D}\rightarrow\mathbb{R}^{D_x\times D_x}\\
\text{Dynamical Adjoint Operator}: && &&
\boldsymbol{J_z}^\top(\boldsymbol{z}) &=
\boldsymbol{F_z}^\top =
\boldsymbol{\nabla_z}\boldsymbol{f}(\boldsymbol{z})|_{\boldsymbol{z}=\boldsymbol{\mu_z}},
&& &&
\mathbb{R}^{D}\rightarrow\mathbb{R}^{D_x\times D_x}\\
\text{Observation Tangent-Linear Operator}: && &&
\boldsymbol{J_h}(\boldsymbol{z}) &=
\boldsymbol{H_z} =
\boldsymbol{\nabla_z}\boldsymbol{h}(\boldsymbol{z})|_{\boldsymbol{z}=\boldsymbol{\mu_z}},
&& &&
\mathbb{R}^{D}\rightarrow\mathbb{R}^{D_y\times D_x}\\
\text{Observation Adjoint Operator}: && &&
\boldsymbol{J_h}^\top(\boldsymbol{z}) &=
\boldsymbol{H_z}^\top =
\boldsymbol{\nabla_z}\boldsymbol{h}(\boldsymbol{z})|_{\boldsymbol{z}=\boldsymbol{\mu_z}},
&& &&
\mathbb{R}^{D_y}\rightarrow\mathbb{R}^{D_x\times D_y}\\
\end{aligned} Dynamical Tangent-Linear Operator : Dynamical Adjoint Operator : Observation Tangent-Linear Operator : Observation Adjoint Operator : J f ( z ) J z ⊤ ( z ) J h ( z ) J h ⊤ ( z ) = F z = ∇ z f ( z ) ∣ z = μ z , = F z ⊤ = ∇ z f ( z ) ∣ z = μ z , = H z = ∇ z h ( z ) ∣ z = μ z , = H z ⊤ = ∇ z h ( z ) ∣ z = μ z , R D → R D x × D x R D → R D x × D x R D → R D y × D x R D y → R D x × D y Tangent-Linear Model ¶ Tangent-Linear Model : ( u , v ) → ∂ z f ( z ) v Input Vector : u : ∈ R D Tangent Vector : v : ∈ R D Jacobian-Vector Product : jvp : R D × R M → R D \begin{aligned}
\text{Tangent-Linear Model}: && &&
(\boldsymbol{u,v}) &\rightarrow
\partial_{\boldsymbol{z}}\boldsymbol{f}(\boldsymbol{z})\boldsymbol{v} \\
\text{Input Vector}: && &&
\boldsymbol{u}: &\in \mathbb{R}^{D} \\
\text{Tangent Vector}: && &&
\boldsymbol{v}: &\in \mathbb{R}^{D} \\
\text{Jacobian-Vector Product}: && &&
\text{jvp} &: \mathbb{R}^{D}\times\mathbb{R}^{M}\rightarrow\mathbb{R}^{D}
\end{aligned} Tangent-Linear Model : Input Vector : Tangent Vector : Jacobian-Vector Product : ( u , v ) u : v : jvp → ∂ z f ( z ) v ∈ R D ∈ R D : R D × R M → R D Adjoint Model ¶ Adjoint Model : ( u , v ) → v ∂ z f ( z ) ⊤ Input Vector : u : ∈ R D Tangent Vector : v : ∈ R M Jacobian-Vector Product : jvp : R D × R M → R M \begin{aligned}
\text{Adjoint Model}: && &&
(\boldsymbol{u,v}) &\rightarrow
\boldsymbol{v}\partial_{\boldsymbol{z}}\boldsymbol{f}(\boldsymbol{z})^\top \\
\text{Input Vector}: && &&
\boldsymbol{u}: &\in \mathbb{R}^{D} \\
\text{Tangent Vector}: && &&
\boldsymbol{v}: &\in \mathbb{R}^{M} \\
\text{Jacobian-Vector Product}: && &&
\text{jvp} &: \mathbb{R}^{D}\times\mathbb{R}^{M}\rightarrow\mathbb{R}^{M}
\end{aligned} Adjoint Model : Input Vector : Tangent Vector : Jacobian-Vector Product : ( u , v ) u : v : jvp → v ∂ z f ( z ) ⊤ ∈ R D ∈ R M : R D × R M → R M