Loss Functions¶
Recall the change of variables formulation to calculate the probability:
p_\theta(x) = p_z(z) \; |\nabla_x \mathcal{G}_\theta(x)|
and we can also calculate the log probability like so:
\log p_\theta(x) = \log p_z(z) + \log |\nabla_x \mathcal{G}_\theta(x)|
where z=\mathcal{G}_\theta(x).
Negative Log-Likelihood¶
-\mathbb{E}_\mathbf{x}\left[ \log p_\theta(x)\right] =
- \mathbb{E}_x \left[ \log p_z(\mathcal{G}_\theta(x)) + \log |\nabla_x \mathcal{G}_\theta (x)| \right]
Empirically, this can be calculated by:
-\mathbb{E}_\mathbf{x}\left[ \log p_\theta(x)\right] =
-\frac{1}{N} \sum_{i=1}^N \log p_z(\mathcal{G}_\theta(x_i)) -
\frac{1}{N} \sum_{i=1}^N \log |\nabla_x \mathcal{G}_\theta (x_i)|
Non-Gaussianity¶
Another perspective is the "Non-Gaussianity" of your data.
J(p_y) = \mathbb{E}_x \left[ \log p_x(x) - \log \left| \nabla_x \mathcal{G}_\theta(x) \right| - \log \mathcal{N}\left(\mathcal{G}_\theta(x)\right)\right]
If we assume that the probability of p_x(x)=c because it will never change, it means that the only thing we have to do is minimize the 2nd and 3rd terms.
\begin{aligned}
J(p_y) &=
- \mathbb{E}_x \left[ \log \left| \nabla_x \mathcal{G}_\theta(x) \right| \right] -
\mathbb{E}_x \left[ \log \mathcal{N}\left(\mathcal{G}_\theta(x)\right) \right] \\
\end{aligned}
which we can find empirically:
J(p_y) =
\sum_{i=1}^N \log \left| \nabla_x \mathcal{G}_\theta(x) \right| -
\sum_{i=1}^N \log \mathcal{N}\left(\mathcal{G}_\theta(x_i)\right)
! Question: What's the difference between the two equations? Perhaps part 1, you fit a Gaussian...
Change in Total Correlation¶
Change in Non-Gaussianity¶
\Delta J(p_y) = J(p_y) - J(p_x)
\Delta J(p_y) = \mathbb{E}_x \left[ \frac{1}{2} ||y||_2^2 - \log |\nabla_x \mathcal{G}_\theta (x)| - \frac{1}{2} ||x||_2^2 \right]
Empirically, we can calculate this by:
\Delta J(p_y) = \frac{1}{2} ||y||_2^2 - \frac{1}{2} ||x||_2^2 - \frac{1}{N}\sum_{i=1}^N
\log |\nabla_x \mathcal{G}_\theta (x)|