Skip to article frontmatterSkip to article content

Inverse Problems

CNRS
MEOM

Problem Formulation

y=h(x;θ)+η\mathbf{y} = \boldsymbol{h}(\mathbf{x};\boldsymbol{\theta}) + \boldsymbol{\eta}

where:

  • yRDy\mathbf{y} \in \mathbb{R}^{D_\mathbf{y}} is a noisy measurement
  • xRDx\mathbf{x} \in \mathbb{R}^{D_\mathbf{x}} is the original signal
  • h(;θ):RDxRDy\boldsymbol{h}(\:\cdot \:;\boldsymbol{\theta}): \mathbb{R}^{D_\mathbf{x}} \rightarrow \in \mathbb{R}^{D_\mathbf{y}} is a measurement function, parameterized by θ\boldsymbol{\theta}.

Data

xRDx\mathbf{x} \in \mathbb{R}^{D_{\mathbf{x}}}
  • 1D Signal: xRD\mathbf{x} \in \mathbb{R}^{D}, xRT\mathbf{x} \in \mathbb{R}^{T}
  • 2D Image: xRH×W\mathbf{x} \in \mathbb{R}^{H\times W}, xRu×v\mathbf{x} \in \mathbb{R}^{u\times v}
  • 3D Volume: xRH×W×C\mathbf{x} \in \mathbb{R}^{H \times W \times C}, xRu×v×depth\mathbf{x} \in \mathbb{R}^{u \times v \times depth}, xRu×v×time\mathbf{x} \in \mathbb{R}^{u \times v \times time}
  • 2D Volume Sequence: xRH×W×C×time\mathbf{x} \in \mathbb{R}^{H \times W \times C \times time}, xRu×v×depth×time\mathbf{x} \in \mathbb{R}^{u \times v \times depth \times time}

Forward Direction

The first direction is the forward direction.

y=h(x;θ)\mathbf{y} = \boldsymbol{h}(\mathbf{x};\boldsymbol{\theta})

We can try to find a solution for the forward problem which is to use the state, x\mathbf{x}, to help predict the observations, y\mathbf{y}. This is typically the easier of the two directions. Many times we simply need to approximate y\mathbf{y} with some function fθ(x)\boldsymbol{f}_{\boldsymbol{\theta}}(\mathbf{x}). We can use point estimates, i.e. y^y=argmaxp(yx)\hat{\mathbf{y}}\approx \mathbf{y}^* = \operatorname*{argmax} p(\mathbf{y|x}), or we can try to obtain posterior distributions, i.e. p^θ(yx)p(yx)\hat{p}_{\boldsymbol{\theta}}(\mathbf{y|x}) \approx p(\mathbf{y|x}). We often call this discriminative or transductive machine learning. However, discriminative models are hard to interpret, explain and validate.

Note: This is quite different than traditional sciences because we don’t “model the real world”.


Inverse Direction

The other direction is the inverse direction.

x=h1(y;θ)\mathbf{x} = \boldsymbol{h}^{-1}(\mathbf{y};\boldsymbol{\theta})

We can also try to find a solution to the inverse problem whereby we have some observations, y\mathbf{y}, and we want to them help us predict some state, x\mathbf{x}. So we are more interested in the data generation likelihood process, p(xy)p(\mathbf{x}|\mathbf{y}). It is a more difficult problem because we need to make assumptions about our system in order to formulate a problem. Fortunately, once the problem has been formulated, we can use Bayes theorem and the standard tricks within to solve the problem. This is often called generative modeling.

Example Problem:

Let’s take some hidden system parameters, x\mathbf{x}, and let’s take some observations of the system behaviour, y\mathbf{y}. Our objective is the determine the posterior p(xy=y^)p(\mathbf{x}|\mathbf{y=\hat{y}}) to estimate the parameters, x\mathbf{x}, from some measured y^\hat{\mathbf{y}}. We could learn a p(xy)p(\mathbf{x|y}) using synthetic data from a simulation y=g(x;ϵ)\mathbf{y}=\boldsymbol{g}(\mathbf{x}; \boldsymbol{\epsilon}) of the forward process.


Noise


Ill-Posed

Our problem is ill-posed. This means that the space of possible solutions is very large, many of which are nonsensical. So we need to constrain our model such that the space is manageable. If we think about our unknowns, first we have an unknown input, x\mathbf{x}.

L(θ,x)=arg minx,θyf(x;θ)22+λR(x)\mathcal{L}(\theta,\mathbf{x}) = \argmin_{\mathbf{x},\theta} ||\mathbf{y} - \boldsymbol{f}(\mathbf{x};\theta)||_2^2 + \lambda \mathcal{R}(\mathbf{x})

In the case of f\boldsymbol{f}, we also have unknown parameters, θ.

L(θ,x)=arg minx,θyf(x;θ)22+λ1R(x)+λ2R(θ)\mathcal{L}(\theta,\mathbf{x}) = \argmin_{\mathbf{x},\theta} ||\mathbf{y} - \boldsymbol{f}(\mathbf{x};\theta)||_2^2 + \lambda_1 \mathcal{R}(\mathbf{x}) + \lambda_2 \mathcal{R}(\theta)

Solutions

Empirical Minimization

This is the case when we want to learn a solver for deterministic inverse problems. Given a set of inputs, D={yi,xi}i=1N\mathcal{D}=\{ \mathbf{y}_i, \mathbf{x}_i \}_{i=1}^N. We also choose the functional form of our model, f\boldsymbol{f}, e.g. a neural network. In this case, our function is parameterized by θ.

LMSE(θ)=12yf(x;θ)22\mathcal{L}_{\text{MSE}}(\theta) = \frac{1}{2}||y - f(x;\theta)||_2^2

In order to restrict the space of solutions, we penalize the parameters, θ, of our function, f\boldsymbol{f}.

LMSE(θ)=1Ni=1Nyif(xi;θ)22+λR(θ)\mathcal{L}_{\text{MSE}}(\theta) = \frac{1}{N}\sum_{i=1}^N||\mathbf{y}_i - f(\mathbf{x}_i;\theta)||_2^2 + \lambda \mathcal{R}(\theta)

We can do an offline training regime for the best parameters, θ, given our training data, D\mathcal{D}. Then


Bayesian Inversion

We take the posterior density

p(θx,y)p(yx,θ)p(θ)p(\theta|x, y) \propto p(y|x,\theta)p(\theta)

Minimization (MLE)

θ=arg maxxp(yx)\boldsymbol{\theta}^* = \argmax_{\mathbf{x}} p(\mathbf{y}|\mathbf{x})

MAP

L(θ)=arg maxxp(xy)arg maxxp(yx)p(x;θ)\mathcal{L}(\boldsymbol{\theta}) = \argmax_{\mathbf{x}} p(\mathbf{x}|\mathbf{y}) \propto \argmax_{\mathbf{x}} p(\mathbf{y}|\mathbf{x})p(\mathbf{x};\boldsymbol{\theta})

We assume a Gaussian measurement model

p(yx)N(y;h(x;θ),σ2)p(\mathbf{y}|\mathbf{x}) \sim \mathcal{N}(\mathbf{y};\boldsymbol{h}(\mathbf{x};\boldsymbol{\theta}), \sigma^2)

Loss Function (Explicit)

L(θ,x)=arg minθ,x12σ2yh(x;θ)22logp(x;θ)\mathcal{L}(\boldsymbol{\theta},\mathbf{x}) = \argmin_{\boldsymbol{\theta}, \mathbf{x}} \frac{1}{2\sigma^2}||\mathbf{y} - \boldsymbol{h}(\mathbf{x};\boldsymbol{\theta})||_2^2 - \log p(\mathbf{x};\boldsymbol{\theta})

Energy-Based Representation

U()exp(LMSE(θ))U() \propto \exp(-\mathcal{L}_{\text{MSE}}(\theta))

Generic Formulation

x=arg minxλ1yh(x)22+λ2R(x)\mathbf{x} = \argmin_{\mathbf{x}} \lambda_1||\mathbf{y} - \boldsymbol{h}(\mathbf{x})||_2^2 + \lambda_2 \boldsymbol{R}(\mathbf{x})

Example: Denoising

y=x+ϵ,    ϵN(0,σ2)\mathbf{y} = \mathbf{x} + \epsilon, \;\; \epsilon \sim \mathcal{N}(0, \sigma^2)

Probabilistic Priors

In this case, we assume some probabilistic prior

xPX\mathbf{x} \sim P_X

Now, we can formulate this as a

x=arg minxλyx22logpX(x)\mathbf{x} = \argmin_\mathbf{x} \lambda ||\mathbf{y} - \mathbf{x}||_2^2 - \log p_X(\mathbf{x})

Examples: Variational AutoEncoders, AutoRegressive Models, Normalizing Flows

  • Deep Unfolding with Normalizing Flow Priors for Inverse Problems - Wei et al (2021)

Normalizing Flow Prior

Normalizing Direction

z=f(x;θ)=fLfL1f1(x)\begin{aligned} \mathbf{z} &= \boldsymbol{f}(\mathbf{x};\boldsymbol{\theta}) \\ &= \boldsymbol{f}_L\circ \boldsymbol{f}_{L-1} \circ \cdots \circ \boldsymbol{f}_1 (\mathbf{x})\\ \end{aligned}

Generative Direction

x=g(z;θ)=f1(z;θ)=f11f21fL1(x)\begin{aligned} \mathbf{x} &= \boldsymbol{g}(\mathbf{z};\boldsymbol{\theta}) \\ &= \boldsymbol{f}^{-1}(\mathbf{z};\boldsymbol{\theta}) \\ &= \boldsymbol{f}_1^{-1}\circ \boldsymbol{f}_{2}^{-1} \circ \cdots \circ \boldsymbol{f}_L^{-1} (\mathbf{x})\\ \end{aligned}

Density Evaluation

logp(x;θ)=logp(z)+logdetxf(x;θ)\log p(\mathbf{x};\boldsymbol{\theta}) = \log p(\mathbf{z}) + \log \left| \det \boldsymbol{\nabla}_\mathbf{x} \boldsymbol{f}(\mathbf{x};\boldsymbol{\theta}) \right|

Cost Function (Explicit)

L(x,z,θ)=arg minθ,z12σ2yAg(z;θ)22logp(z)=arg minθ,zyAg(z;θ)22λz22\begin{aligned} \mathcal{L}(\mathbf{x}, \mathbf{z}, \boldsymbol{\theta}) &= \argmin_{\mathbf{\theta},\mathbf{z}} \frac{1}{2\sigma^2} ||\mathbf{y} - \mathbf{A}\boldsymbol{g}(\mathbf{z}; \boldsymbol{\theta})||_2^2 - \log p(\mathbf{z}) \\ &= \argmin_{\mathbf{\theta},\mathbf{z}} ||\mathbf{y} - \mathbf{A}\boldsymbol{g}(\mathbf{z}; \boldsymbol{\theta})||_2^2 - \lambda ||\mathbf{z}||_2^2 \\ \end{aligned}

where λ is a regularization term to trade-off the prior versus the measurement.

Cost Function (Implicit)

z^=arg minzyAg(z;θ)22logp(z)=arg minθ,zyAg(z;θ)22λz22\begin{aligned} \hat{\mathbf{z}} &= \argmin_{\mathbf{z}} ||\mathbf{y} - \mathbf{A}\boldsymbol{g}(\mathbf{z};\boldsymbol{\theta})||_2^2 - \log p(\mathbf{z}) \\ &= \argmin_{\mathbf{\theta},\mathbf{z}} ||\mathbf{y} - \mathbf{A}\boldsymbol{g}(\mathbf{z}; \boldsymbol{\theta})||_2^2 - \lambda ||\mathbf{z}||_2^2 \\ \end{aligned}

Dictionary Prior

Examples: PCA (aka EOF, POD, SVD)

x=Dα\mathbf{x} = \mathbf{D}\boldsymbol{\alpha}
x=arg minαyDα22\mathbf{x} = \argmin_{\boldsymbol{\alpha}}||\mathbf{y} - \mathbf{D}\boldsymbol{\alpha}||_2^2

Norm-Based Prior

x=arg minxλyx22λ2x22\mathbf{x} = \argmin_\mathbf{x} \lambda ||\mathbf{y} - \mathbf{x}||_2^2 - \lambda_2 ||\nabla \mathbf{x}||_2^2