Preliminary Results - Research Notebook

An overview of the methods used to find the best parameters.

Inference¶

In general, there are three different ways to get the parameters.

Point Estimates¶

We assume that the posterior distribution is proportional to the decomposition of the joint distribution and ignore the normalization constant.

p(\boldsymbol{\theta}|\mathbf{y}) \propto p(\mathbf{y}|\boldsymbol{\theta})p(\boldsymbol{\theta})

(1)

Thus, we will acquire an approximate estimate of the parameters given the measurements. To minimize this, we will simply

\boldsymbol{L}(\boldsymbol{\theta}) = \underset{\boldsymbol{\theta}}{\text{argmin}} \hspace{2mm} \sum_{n=1}^N\log p(\boldsymbol{\theta}|\mathbf{y}_n)

(2)

Optimization Scheme

We still need to find the parameters whichever methodology we use.

\boldsymbol{\theta}^* = \underset{\boldsymbol{\theta}}{\text{argmin}} \hspace{2mm} \boldsymbol{L}(\boldsymbol{\theta})

(3)

This requires one to iterate until convergence

\begin{aligned} \text{Initial Parameters}: && && \boldsymbol{\theta}_0 &= \ldots \\ \text{Initial Optimization State}: && && \mathbf{h}_0 &= \ldots \\ \text{Optimization Step}: && && \boldsymbol{\theta}^{(k)}, \mathbf{h}^{(k)} &= \boldsymbol{g}(\boldsymbol{\theta}^{(k-1)}, \mathbf{h}^{(k-1)}, \boldsymbol{\alpha}) \\ \end{aligned}

(4)

For these methods, we use a highlevel optimizer for solving unconstrained problems. In particular, we use the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm Fletcher, 2000 [wiki]. It is a higher level optimization scheme which uses the Hessian matrix of the loss function, in this case it is the negative log-likelihood loss. It is chosen because it offers very fast convergence for nonlinear optimization problems where higher level gradient information is available.

MLE Estimation¶

\boldsymbol{L}_\text{MLE}(\boldsymbol{\theta}) = \underset{\boldsymbol{\theta}}{\text{argmin}} \hspace{2mm} \sum_{n=1}^N\log p(y_n|\boldsymbol{\theta})

(5)

We put some constraints on the parameters. The mean and shape parameters are allowed to be completely free however, the scale parameter is constrained to be positive.

\begin{aligned} \text{Mean}: && && \mu &\in \mathbb{R} \\ \text{Scale}: && && \sigma &\in \mathbb{R}^+ \\ \text{Shape}: && && \kappa &\in \mathbb{R} \end{aligned}

(6)

MAP Estimation¶

The MAP estiamtion is very similar to the MLE estimation except that we put priors on the parameters.

\boldsymbol{L}_\text{MAP}(\boldsymbol{\theta}) = \underset{\boldsymbol{\theta}}{\text{argmin}} \hspace{2mm} \sum_{n=1}^N\log p(y_n|\boldsymbol{\theta}) + \log p(\boldsymbol{\theta})

(7)

We put some prior distributions on the parameters. The mean and shape parameters are allowed to be completely free however, the scale parameter is constrained to be positive.

\begin{aligned} \text{Mean}: && && \mu &\sim \text{Normal}(\hat{\mu},\hat{\sigma})\\ \text{Scale}: && && \sigma &\sim \text{LogNormal}(0.5\hat{\sigma}, 0.25)\\ \text{Shape}: && && \kappa &\sim \text{Normal}(\hat{\kappa}, 0.1)\\ \end{aligned}

(8)

The estimated parameters for the $\mu$ are estimated directly from the data by calculating the mean and standard deviation. We use the same estimated parameter

Approximate Inference¶

Laplace Approximation (TODO)¶

SVI (TODO)¶

Sampling¶

MCMC (TODO)¶

References¶

Fletcher, R. (2000). Practical Methods of Optimization. Wiley. 10.1002/9781118723203