Physics Informed Loss Function

How we can add physics into our models

CNRS
MEOM

Model

The objective would be to somehow find the true functa. In geosciences, we can write some partial differential equations (PDE) that we believe describe how this field changes in space and time. In most cases, we typically denote this evolving field as uu which is some approximation to the true field f\boldsymbol{f}.

f(xs,t)u(xs,t)u:Ω×TRD\boldsymbol{f}(\mathbf{x}_s,t)\approx\boldsymbol{u}(\mathbf{x}_s,t)\hspace{15mm}\boldsymbol{u}:\Omega\times\mathcal{T}\rightarrow\mathbb{R}^{D}

where field uu is a representation of the true field ff. However, we need some way for this field, uu, to be close to the true field ff. So this is where the (partial) differential equations come into play. We state that this field needs to satisfy some constraints which we define as space-time operations. The associated PDE constraint is defined by some arbitrary differential operators on the field to describe how it changes in space and time. Therefore, the PDE can be thought of a set of equations that act as constraints to how the field needs to behave in space and time. So we can add the PDE constraints as:

u=u(xs,t)s.t. tu=N[u;θ](xs,t)\begin{aligned} u &= \boldsymbol{u}(\mathbf{x}_s,t) \\ \text{s.t. } \partial_tu &=\mathcal{N}[u;\boldsymbol{\theta}](\mathbf{x}_s,t) \end{aligned}

where N\mathcal{N} is the differential operator on the field and θ\boldsymbol{\theta} are the (hyper-) parameters for the PDE. These parameters don’t actually exist in nature. They are artefacts introduced by the PDE which are often unknown and/or assumed based on some prior knowledge.

Example: Sea Surface Height

For our SSH variable, there are many approximate models we can use to describe the dynamics. One such model is the Quasi-Geostrophic equations given by

η=ηθ(xs,t)ηθ:Ω×TRs.t. η˙=gfdetJ(η,η)\begin{aligned} \eta &= \boldsymbol{\eta_\theta}(\mathbf{x}_s,t) && && &\boldsymbol{\eta_\theta}:\Omega\times\mathcal{T}\rightarrow\mathbb{R} \\ \text{s.t. }\dot{\eta} &= -\frac{g}{f} \det\boldsymbol{J}(\eta,\nabla\eta) \end{aligned}

Through a series of assumptions, we approximate this. For this example, we know that this is a crude approximate of the actual dynamics. However, we can assume that this is “good enough”.

Domain

Because our functa is often defined on a bounded domain, our PDE must also be able to understand the field on the bounded domain. For the spatial domain, we need to describe what happens at the edges (e.g. rectangle) of the domain. For the temporal domain, we need to describe what happens at the beginning of the domain, e.g. t=0t=0. We can also define these as operators. Let’s define these as:

BC[u;θ](xs,t)=ub,xsΩtTIC[u;θ](xs,0)=u0,xsΩ\begin{aligned} \mathcal{BC}[u;\boldsymbol{\theta}](\mathbf{x}_s,t) &= \boldsymbol{u}_b, && && \mathbf{x}_s\in\partial\Omega && & &t\in\mathcal{T} \\ \mathcal{IC}[u; \boldsymbol{\theta}](\mathbf{x}_s,0) &= \boldsymbol{u}_0, && && \mathbf{x}_s\in\Omega \end{aligned}

where BC\mathcal{BC}, are the boundary conditions on the field, IC\mathcal{IC} are the initial conditions on the field. The boundary conditions dictate the behaviour on the spatial domain on the boundaries and the initial conditions dictate the behaviour at the initial condition, t=0t=0. We find these a lot even in ML applications. For example, whenever we deal with convolutions on images, we need to think about what to do at the boundaries (the solution is almost always padding, a.k.a. ghost points). In toy problems in physics, we also often simplify these to make the problem easier and well-behaved. A common approach is to use periodic boundary conditions; which are very rare in nature; but they are very convenient because they allow us to use simpler solvers like spectral and pseudo-spectral solvers. If we have access to observations, then we can use these as initial and boundary conditions. This is often done in data assimilation fields like gap-filling and reanalysis.


Parameterized Model

We could also assume that we don’t know anything about the physical equations that govern the system. However, we believe that we can learn about it from data. Let’s assume (by luck) we can define each pairwise spatial-temporal coordinate and field value of the functa that we are interested in. So we have a set of pairwise points which we can call a dataset D\mathcal{D} defined as

D={(xs,n,tn),fn}n\mathcal{D} = \left\{(\mathbf{x}_{s,n},t_n),\boldsymbol{f}_n \right\}_n^{\infty}

I say this is infinite because technically we can sample any continuous function infinitely many times without revisiting any previous samples (even on a bounded domain). The objective would be to find some sort of approximation of the actual function, f\boldsymbol{f}, which maps each of these coordinate-values to the correct scaler/vector value. So we can define some arbitrary parameterized function, fθ\boldsymbol{f_\theta}, which tries to approximate the functa. We can say that:

f(xs,t)fθ(xs,t)\boldsymbol{f}(\mathbf{x}_s,t)\approx\boldsymbol{f_\theta}(\mathbf{x}_s,t)

This parameters depend upon the architecture of the function we choose. Again, like the PDE, these parameters are artefacts introduced by the function. So for a linear function, we may have just a few set of parameters (weights and bias), for a basis function we may the same parameters with some additional hyper-parameters for the basis, and neural networks have many weights and biases which we apply compositionally. Now, if we have a flexible enough model and infinite data, we should be able to find such parameters to fit the functa. However, the problem becomes how to find those parameters. This is the learning problem. We assume the solution exists and we can find it. But the question becomes: 1) how do we find it and 2) how do we know we have found it. However, there is an entire field dedicated to trying to resolve these issues, e.g. optimization for finding the solution and defining the metrics for knowing if we’ve found it. In addition, I stated that we assume the problem exists and we have infinite data which is never true at all. So that only adds more problems…

PINNS Formulation

PDE Formulation

  • Equation of Motion
  • Boundary Conditions
  • Initial Conditions

Add governing equations to the loss function.

Architectures

  • MLP
  • Random Fourier Features
  • SIREN

Training

This section was taken from [Wang et al., 2023]


Example: Shallow Water Equations

This example was taken from [Bihlo & Popovych, 2022]


Spatially Discretized

  • CNN, Transformers, Neural Operators

Example: Neural Operators

This example was taken from [Li et al., 2021]


Example: Denoising

This example was taken from [Kelshaw & Magri, 2022]


Example: Super Resolution

This example was taken from [Kelshaw & Magri, 2023]


Temporally Discretized

  • LSTMs

Spatiotemporally Discretized

  • CNNs + LSTMs, Neural Operators
References
  1. Wang, S., Sankaran, S., Wang, H., & Perdikaris, P. (2023). An Expert’s Guide to Training Physics-informed Neural Networks. arXiv. 10.48550/ARXIV.2308.08468
  2. Bihlo, A., & Popovych, R. O. (2022). Physics-informed neural networks for the shallow-water equations on the sphere. Journal of Computational Physics, 456, 111024. 10.1016/j.jcp.2022.111024
  3. Li, Z., Zheng, H., Kovachki, N., Jin, D., Chen, H., Liu, B., Azizzadenesheli, K., & Anandkumar, A. (2021). Physics-Informed Neural Operator for Learning Partial Differential Equations. arXiv. 10.48550/ARXIV.2111.03794
  4. Kelshaw, D., & Magri, L. (2022). Physics-Informed Convolutional Neural Networks for Corruption Removal on Dynamical Systems. arXiv. 10.48550/ARXIV.2210.16215
  5. Kelshaw, D., & Magri, L. (2023). Super-resolving sparse observations in partial differential equations: A physics-constrained convolutional neural network approach. arXiv. 10.48550/ARXIV.2306.10990