Models#
Model
Measurement Model
Likelihood Loss Function
Loss Function
Error Minimization#
We choose an error measure or loss function, \(\mathcal{L}\), to minimize wrt the parameters, \(\theta\).
We typically add some sort of regularization in order to constrain the solution
Probabilistic Approach#
We explicitly account for noise in our model.
where \(\epsilon\) is the noise. The simplest noise assumption we see in many approaches is the iid Gaussian noise.
So given our standard Bayesian formulation for the posterior
we assume a Gaussian observation model
and in turn a likelihood model
Objective: maximize the likelihood of the data, \(\mathcal{D}\) wrt the parameters, \(\theta\).
Note: For a Gaussian noise model (what we have assumed above), this approach will use the same predictions as the MSE loss function (that we saw above).
We can simplify the notion a bit to make it more compact. This essentially puts all of the observations together so that we can use vectorized representations, i.e. \(\mathcal{D} = \{ x_i, y_i\}_{i=1}^N\)
where \(||\cdot ||_2^2\) is the Maholanobis Distance.
Note: we often see this notation in many papers and books.
Priors#
Different Parameterizations#
Model |
Equation |
---|---|
Identity |
\( \mathbf{x}\) |
Linear |
\(\mathbf{wx}+\mathbf{b}\) |
Basis |
\(\mathbf{w}\boldsymbol{\phi}(\mathbf{x}) + \mathbf{b}\) |
Non-Linear |
\(\sigma\left( \mathbf{wx} + \mathbf{b}\right)\) |
Neural Network |
\(\boldsymbol{f}_{L}\circ \boldsymbol{f}_{L-1}\circ\ldots\circ\boldsymbol{f}_1\) |
Functional |
\(\boldsymbol{f} \sim \mathcal{GP}\left(\boldsymbol{\mu}_{\boldsymbol \alpha}(\mathbf{x}),\boldsymbol{\sigma}^2_{\boldsymbol \alpha}(\mathbf{x})\right)\) |
Identity#
Linear#
A linear function of \(\mathbf{w}\) wrt \(\mathbf{x}\).
Basis Function#
A linear function of \(\mathbf{w}\) wrt to the basis function \(\phi(x)\).
Examples
\(\phi(x) = (1, x, x^2, \ldots)\)
\(\phi(x) = \tanh(x + \gamma)^\alpha\)
\(\phi(x) = \exp(- \gamma||x-y||_2^2)\)
\(\phi(x) = \left[\sin(2\pi\boldsymbol{\omega}\mathbf{x}),\cos(2\pi\boldsymbol{\omega}\mathbf{x}) \right]^\top\)
Prob Formulation
Likelihood Loss
Non-Linear Function#
A non-linear function in \(\mathbf{x}\) and \(\mathbf{w}\).
Examples
Random Forests
Neural Networks
Gradient Boosting
Prob Formulation
Likelihood Loss
Generic#
A non-linear function in \(\mathbf{x}\) and \(\mathbf{w}\).
Examples
Random Forests
Neural Networks
Gradient Boosting
Prob Formulation
Likelihood Loss
Generic (Heteroscedastic)#
A non-linear function in \(\mathbf{x}\) and \(\mathbf{w}\).
Examples
Random Forests
Neural Networks
Gradient Boosting
Prob Formulation
Likelihood Loss