Skip to article frontmatterSkip to article content

Trade-offs

Pros

Mesh-Free

Lots of Data

Cons

Transfer Learning


Data

xϕRDϕ,      uR\begin{aligned} \mathbf{x}_\phi \in \mathbb{R}^{D_\phi}, \;\;\; \mathbf{u} \in \mathbb{R}^{} \end{aligned}

Model

fθ:XU\boldsymbol{f_\theta}:\mathcal{X} \rightarrow \mathcal{U}

Architectures

We are interested in the case of regression. We have the following generalized architecture.

x(1)=ϕ(x;γ)x(+1)=NN(x();θ)f(x;θ,γ)=w(L)x(L)+b(L)\begin{aligned} \mathbf{x}^{(1)} &= \boldsymbol{\phi} \left( \mathbf{x} ; \boldsymbol{\gamma}\right) \\ \mathbf{x}^{(\ell+1)} &= \text{NN}_\ell \left( \mathbf{x}^{(\ell)}; \boldsymbol{\theta}_\ell\right)\\ \boldsymbol{f}(\mathbf{x}; \boldsymbol{\theta},\boldsymbol{\gamma}) &= \mathbf{w}^{(L)}\mathbf{x}^{(L)} + \mathbf{b}^{(L)} \end{aligned}

where ϕ\boldsymbol{\phi} is the basis transformation with some hyperparameters γ\gamma, NN\text{NN} is the neural network layer parameterized by θ\boldsymbol{\theta}, and we have LL layers, L={1,2,,,,L1,L}L = \{1, 2, \ldots, \ell, \ldots, L-1, L\}

Standard Neural Network

In the standard neural network, we typically have the following standard functions

ϕ(x)=xNNsiren(x();θ)=σ(w()x()+b()),θ={w(),b()}\begin{aligned} \boldsymbol{\phi}(\mathbf{x}) &= \mathbf{x} \\ \text{NN}_{siren} \left( \mathbf{x}^{(\ell)}; \boldsymbol{\theta}\right) &= \boldsymbol{\sigma} \left( \mathbf{w}^{(\ell)} \mathbf{x}^{(\ell)} + \mathbf{b}^{(\ell)} \right), \hspace{10mm} \boldsymbol{\theta} = \{ \mathbf{w}^{(\ell)}, \mathbf{b}^{(\ell)} \} \end{aligned}

So more explicitly, we can write it as:

x(1)=xf()(x())=σ(w()x()+b())f(L)(x(L))=w(L)x(L)+b(L)\begin{aligned} \mathbf{x}^{(1)} &= \mathbf{x} \\ \boldsymbol{f}^{(\ell)}(\mathbf{x}^{(\ell)}) &= \boldsymbol{\sigma} \left( \mathbf{w}^{(\ell)} \mathbf{x}^{(\ell)} + \mathbf{b}^{(\ell)} \right)\\ \boldsymbol{f}^{(L)}(\mathbf{x}^{(L)}) &= \mathbf{w}^{(L)}\mathbf{x}^{(L)} + \mathbf{b}^{(L)} \end{aligned}

where ={1,2,,L1}\ell = \{1, 2, \ldots, L-1\}. Noteably:


Positional Encoding


Fourier Features

ϕ(x)=[sin(ωx)cos(ωx)],ωp(ω;γ)\boldsymbol{\phi} \left(\mathbf{x}\right) = \begin{bmatrix} \sin \left( \boldsymbol{\omega}\mathbf{x}\right) \\ \cos \left( \boldsymbol{\omega} \mathbf{x}\right) \end{bmatrix},\hspace{10mm} \boldsymbol{\omega} \sim p(\boldsymbol{\omega};\gamma)
MethodKernelDistribution
GaussianN(0,1σ2Ir)\mathcal{N}(\mathbf{0},\frac{1}{\sigma^2}\mathbf{I}_r)
LaplacianCauchy()\text{Cauchy}()
CauchyLaplace()\text{Laplace}()
MaternBessel()\text{Bessel}()
ArcCosine

Alternative Formulation

ϕ(x)=2Drffcos(ωx+b)\boldsymbol{\phi}(\mathbf{x}) = \sqrt{\frac{2}{D_{rff}}}\cos \left( \boldsymbol{\omega}\mathbf{x} + \boldsymbol{b}\right)

where ωp(ω)\boldsymbol{\omega} \sim p(\boldsymbol{\omega}) and bU(0,2π)\boldsymbol{b} \sim \mathcal{U}(0,2\pi).

Source:

SIREN

σ=sin(ω0(wx+b))\boldsymbol{\sigma} = \sin \left( \boldsymbol{\omega}_0 (\mathbf{wx} + b)\right)
σ=αsin(wx+b)\boldsymbol{\sigma} = \boldsymbol{\alpha} \odot \sin \left( \mathbf{wx} + \mathbf{b} \right)
FiLM(x)=αx+β\text{FiLM}(\mathbf{x}) = \boldsymbol{\alpha} \odot \mathbf{x} + \boldsymbol{\beta}

Extended

σ=sin(γ(wx+b)+β)\begin{aligned} \boldsymbol{\sigma} = \sin \left( \boldsymbol{\gamma}(\mathbf{wx} + b\right) + \boldsymbol{\beta}) \end{aligned}

where γ\boldsymbol{\gamma} corresponds to the frequencies and β\boldsymbol{\beta} corresponds to the phase shifts.


Modulation

Modulation is

f(x,z;θ):=hM(  NN(x;θNN)  ,  M(z;θM)  )\boldsymbol{f}^\ell(\mathbf{x},\mathbf{z};\boldsymbol{\theta}) := \boldsymbol{h}_M^\ell\left(\;\text{NN}(\mathbf{x};\boldsymbol{\theta}_{NN})\;,\; \text{M}(\mathbf{z};\boldsymbol{\theta}_{M}) \;\right)

where NNNN is the output of the neural network wrt the input, x\mathbf{x}, where MM is the output of the modulation function wrt the latent variable, z\mathbf{z}, and ×\times is an arbitrary operator.


FILM, 2020

Mehta, 2021

Dupoint, 2022

Neural Implicit Flows Pan, 2022

Neural


Affine Modulation

Affine Modulations

z(1)=xz(k+1)=σ((w(k)z(k)+b(k))sm(z)+am(z))f(x)=wKzK+b(k)\begin{aligned} \mathbf{z}^{(1)} &= \mathbf{x} \\ \mathbf{z}^{(k+1)} &= \boldsymbol{\sigma} \left( \left(\mathbf{w}^{(k)} \mathbf{z}^{(k)} + \mathbf{b}^{(k)}\right)\odot \boldsymbol{s}_m(\mathbf{z}) + \boldsymbol{a}_m(\mathbf{z}) \right)\\ \boldsymbol{f}(\mathbf{x}) &= \mathbf{w}^{K}\mathbf{z}^{K} + \mathbf{b}^{(k)} \end{aligned}

Shift Modulations

Neural Implicit Flows

In this work, we have a version of the Modulated Siren as mentioned above. However, they use a version that separates the space and time neural networks.

f(xϕ,t)=NNspace(xϕ;NNtime(t))\boldsymbol{f}(\mathbf{x}_\phi, t) = \text{NN}_{space}(\mathbf{x}_\phi;\text{NN}_{time}(t))

Multiplicative Filter Networks

z(1)=xz(k+1)=σ(w(k)z(k)+b(k))f(x)=wKzK+b(k)\begin{aligned} \mathbf{z}^{(1)} &= \mathbf{x} \\ \mathbf{z}^{(k+1)} &= \boldsymbol{\sigma} \left( \mathbf{w}^{(k)} \mathbf{z}^{(k)} + \mathbf{b}^{(k)} \right) \\ \boldsymbol{f}(\mathbf{x}) &= \mathbf{w}^{K}\mathbf{z}^{K} + \mathbf{b}^{(k)} \end{aligned}

where K={1,2,,K}K = \{1, 2, \ldots, K\}

Non-Linear Functions

FOURIERNET

This method corresponds to the random Fourier Feature transformation.

g()(x;θ())=sin(w()x+b())\boldsymbol{g}^{(\ell)}(\mathbf{x};\boldsymbol{\theta}^{(\ell)}) = \sin\left( \mathbf{w}^{(\ell)}\mathbf{x} + \mathbf{b}^{(\ell)}\right)

where the parameters to be learned are:

θ()={wd(),    bd()}\boldsymbol{\theta}^{(\ell)} = \{\mathbf{w}_d^{(\ell)}, \;\; \mathbf{b}^{(\ell)}_d \}

GABORNET

This method tries to improve upon the Fourier representation. The Fourier representation has global support and would have more difficulties representing more local features. The Gabor filter (see below) will be able to capture both frequency and spatial locality component.

g()(x;θ())=exp(γd()2xμd()22)sin(w()x+b())\boldsymbol{g}^{(\ell)}(\mathbf{x};\boldsymbol{\theta}^{(\ell)}) = \exp\left( - \frac{\gamma_d^{(\ell)}}{2}||\mathbf{x} - \boldsymbol{\mu}_d^{(\ell)}||_2^2 \right) \odot \sin\left( \mathbf{w}^{(\ell)}\mathbf{x} + \mathbf{b}^{(\ell)}\right)

where the parameters to be learned are:

θ()={γd()R,    μd(),    wd(),    bd()}\boldsymbol{\theta}^{(\ell)} = \{ \gamma_d^{(\ell)} \in \mathbb{R},\;\;\boldsymbol{\mu}_d^{(\ell)}, \;\; \mathbf{w}_d^{(\ell)}, \;\; \mathbf{b}^{(\ell)}_d \}

Sources:


Probabilistic

Deterministic

L(θ)=argmin θλnDf(xn;θ)un22logp(θ)\mathcal{L}(\boldsymbol{\theta}) = \underset{\boldsymbol{\theta}}{\text{argmin }} \lambda \sum_{n \in \mathcal{D}} ||\boldsymbol{f}(\mathbf{x}_n;\boldsymbol{\theta}) - \boldsymbol{u}_n||_2^2 - \log p(\boldsymbol{\theta})

Normalizing Flows

Bayesian


Physics Constraints

Mass

Momentum

QG Equations


Applications

Interpolation

Surrogate Modeling

Sampling


Feature Engineering

xRDϕ,D={lat, lon, time}\mathbf{x} \in \mathbb{R}^{D_\phi}, \hspace{10mm} D = \{ \text{lat, lon, time} \}

Spatial Features

For the spatial features, we have spherical coordinates (i.e. longitude and latitude)

x=rcos(λ)cos(ϕ)y=rcos(λ)sin(ϕ)z=rsin(λ)\begin{aligned} x &= r \cos(\lambda)\cos(\phi) \\ y &= r \cos(\lambda)\sin(\phi) \\ z &= r \sin(\lambda) \end{aligned}

where λ\lambda is the latitude, ϕ\phi is the longitude and rr is the radius. Here x,y,zx,y,z are bounded between 0 and 1.


Temporal Features

Tanh

f(t)=tanh(t)f(t) = \tanh(t)

Fourier Features

Sinusoidal Positional Encoding

ϕ(t)=[sin(ωkt)cos(ωkt)]\boldsymbol{\phi}(t) = \begin{bmatrix} \sin(\boldsymbol{\omega}_k t) \\ \cos(\boldsymbol{\omega}_k t) \end{bmatrix}

where

ωk=110,0002kd\boldsymbol{\omega}_k = \frac{1}{10,000^{\frac{2k}{d}}}

Sources:


Experiments

Initial Conditions

Training Time, Convergence

Iterative Schemes

Speed, Accuracy, PreTraining

Priors

The impact on the priors on the learning procedure.

Deterministic
Probabilistic