Bayesian Crash Course

Bayesian Interpretation of all Problems

CNRS
MEOM

How do we formulate the prediction problem for our quantity of interest?

F:Observations×ParametersState\mathcal{F}: \text{Observations} \times \text{Parameters} \rightarrow \text{State}

We have a decision about how we want to formulate this problem. There are two classes of methods: regression-based learning and objective-based learning

Regression-Based:θ=argminuL(θ)Objective-Based:u=argminuJ(u,θ)\begin{aligned} \text{Regression-Based}: && && \boldsymbol{\theta}^* &= \underset{\boldsymbol{u}}{\text{argmin}}\hspace{2mm} \mathcal{L}(\boldsymbol{\theta}) \\ \text{Objective-Based}: && && \boldsymbol{u}^* &= \underset{\boldsymbol{u}}{\text{argmin}}\hspace{2mm} \mathcal{J}(\boldsymbol{u},\boldsymbol{\theta}) \end{aligned}

Example: Sea Surface Height Interpolation

Recall the problem of the mapping problem we wish to solve

F:ηobs×Θηstate\mathcal{F}: \eta_{obs} \times \boldsymbol{\Theta} \rightarrow \eta_{state}

Pros & Cons

Regression-Based Losses:

  • Pro: If the objective, J(u,θ)\mathcal{J}(\boldsymbol{u},\boldsymbol{\theta}), is computationally expensive, we don't need to compute this.
  • Pro: Uses global information of uobs\boldsymbol{u}_{obs}.
  • Pro: Does not need to compute, uJ(u,θ)\boldsymbol{\nabla_u} \mathcal{J}(\boldsymbol{u},\boldsymbol{\theta})
  • Con: Do not have access to J(u,θ)\mathcal{J}(\boldsymbol{u},\boldsymbol{\theta})
  • Con: It may be expensive to compute usim\boldsymbol{u}_{sim}
  • Con: May be hard when u(θ)\boldsymbol{u}^*(\boldsymbol{\theta}) is not unique...

Objective-Based Losses:

  • Pro: Uses objective information of J(u,θ)\boldsymbol{J}(\boldsymbol{u},\boldsymbol{\theta})
  • Pro: Faster, does not require usim\boldsymbol{u}_{sim}
  • Pro: Easily learns non-unique u(θ)\boldsymbol{u}^*(\boldsymbol{\theta}).
  • Con: Can get stuck in local optima of J(u,θ)\mathcal{J}(\boldsymbol{u},\boldsymbol{\theta})
  • Con: Often requires computing uJ(u,θ)\boldsymbol{\nabla}_{\boldsymbol{u}}\mathcal{J}(\boldsymbol{u},\boldsymbol{\theta})

Examples

Denoising

In this example, we are interested in denoising a set of observations, yobs\boldsymbol{y}_\text{obs}. We are interesting in recovering the original signal which believe to be our state, u\boldsymbol{u}. We assume that this can be denoised via a linear operator, H\mathbf{H}. For simplicity, we assume iid Gaussian noise.

yobs=Hu+ε,εN(0,σ2)\boldsymbol{y}_\text{obs} = \mathbf{H}\boldsymbol{u} + \varepsilon, \hspace{5mm} \varepsilon\sim\mathcal{N}(0,\sigma^2)

We can write out the posterior using the Bayesian formulation.

p(uyobs)p(yobsu)p(u)p(\boldsymbol{u}|\boldsymbol{y}_\text{obs}) \propto p(\boldsymbol{y}_\text{obs}|\boldsymbol{u})p(\boldsymbol{u})

We are using linear operations and a Gaussian likelihood, we can use the conjugate posterior which would allow for simpler inference. We can write this as

p(uyobs)exp(J(u,yobs))p(\boldsymbol{u}|\boldsymbol{y}_\text{obs}) \propto \exp\left( -\mathcal{J}\left(\boldsymbol{u},\boldsymbol{y}_\text{obs}\right) \right)

which is connected to the Gibbs distribution. We are left with the objective function

J(u,yobs)=12yobsHu22+λu1\mathcal{J}(\boldsymbol{u},\boldsymbol{y}_\text{obs}) = \frac{1}{2}||\boldsymbol{y}_\text{obs} - \mathbf{H}\boldsymbol{u}||^2_2 + \lambda||\boldsymbol{u}||_1
  • J\mathcal{J} - regularized reconstruction energy
  • λ\lambda - regularization coefficient