Gaussian Distribution
\[f(X)=
\frac{1}{\sqrt{(2\pi)^D|\Sigma|}}
\text{exp}\left( -\frac{1}{2} (x-\mu)^\top \Sigma^{-1} (x-\mu)\right)\]
Likelihood
\[- \ln L = \frac{1}{2}\ln|\Sigma| + \frac{1}{2}(x-\mu)^\top \Sigma^{-1} (x - \mu) + \frac{D}{2}\ln 2\pi \]
Alternative Representation
\[X \sim \mathcal{N}(\mu, \Sigma)\]
where \(\mu\) is the mean function and \(\Sigma\) is the covariance. Let's decompose \(\Sigma\) as with an eigendecomposition like so
\[\Sigma = U\Lambda U^\top = U \Lambda^{1/2}(U\Lambda^{-1/2})^\top\]
Now we can represent our Normal distribution as:
\[X \sim \mu + U\Lambda^{1/2}Z\]
where:
- \(U\) is a rotation matrix
- \(\Lambda^{-1/2}\) is a scale matrix
- \(\mu\) is a translation matrix
- \(Z \sim \mathcal{N}(0,I)\)
or also
\[X \sim \mu + UZ\]
where:
- \(U\) is a rotation matrix
- \(\Lambda\) is a scale matrix
- \(\mu\) is a translation matrix
- \(Z_n \sim \mathcal{N}(0,\Lambda)\)
Reparameterization
So often in deep learning we will learn this distribution by a reparameterization like so:
\[X = \mu + AZ \]
where:
- \(\mu \in \mathbb{R}^{d}\)
- \(A \in \mathbb{R}^{d\times l}\)
- \(Z_n \sim \mathcal{N}(0, I)\)
- \(\Sigma=AA^\top\) - the cholesky decomposition
Entropy
1 dimensional
\[H(X) = \frac{1}{2} \log(2\pi e \sigma^2)\]
D dimensional $\(H(X) = \frac{D}{2} + \frac{D}{2} \ln(2\pi) + \frac{1}{2}\ln|\Sigma|\)$
KL-Divergence (Relative Entropy)
\[
D_\text{KL}(\mathcal{N}_0||\mathcal{N}_1) = \frac{1}{2}
\left[
\text{tr}(\Sigma_1^{-1}\Sigma_0) +
(\mu_1 - \mu_0)^\top \Sigma_1^{-1} (\mu_1 - \mu_0) -
D + \ln \frac{|\Sigma_1|}{|\Sigma_0|}
\right]
\]
if \(\mu_1=\mu_0\) then:
\[
D_\text{KL}(\Sigma_0||\Sigma_1) = \frac{1}{2} \left[
\text{tr}(\Sigma_1^{-1} \Sigma_0) - D + \ln \frac{|\Sigma_1|}{|\Sigma_0|} \right]
\]
Mutual Information
\[I(X)= - \frac{1}{2} \ln | \rho_0 |\]
where \(\rho_0\) is the correlation matrix derived from \(\Sigma_0\).