Normalizing Flows Literature¶

Training¶

Can use Reverse-KL
Can use Forward-KL (aka Maximum Likelihood Estimation)
Generally possible to sample from the model

Maximum Likelihood Training

$\log p_\theta(\mathbf{x}) = \log p_\mathbf{z}(f(\mathbf{x})) + \log \left| \det \nabla_\mathbf{x} f_\theta(\mathbf{x}) \right|$

Stochastic Gradients

$\nabla_\theta \mathbb{E}_{p_\text{data}(\mathbf{x})} \left[ \log p_\theta(\mathbf{x}) \right] = \mathbb{E}_{p_\text{data}(\mathbf{x})} \left[ \nabla_\theta \log p_\theta(\mathbf{x}) \right]$

Stochastic
Scales to Large Datasets
Converges to True minimum
Large body of supportive software

Summarizing¶

Almost all papers are trying to do some form of creating a clever jacobian so that it is relatively cheap to calculate and work with.

I like the slides in this presentation which attempts to summarize the different methods and how they are related.

Jacobian Type	Methods
Determinant Identities	Planar NF, Sylvester NF
Coupling Blocks	NICE, Real NVP, GLOW
AutoRegressive	Inverse AF, Neural AF, Masked AF
Unbiased Estimation	FFJORD, Residual Flows
Diagonal	Gaussianization Flows, GDN

Automatic Differentiation¶

According to a talk by Ricky Chen:

For a full Jacobian, need $d$ separate passes. In general, a Jacobian diagonal has the same cost as the full jacobian.

Not sure I understand this. But apparently, one could use HollowNets to efficiently compute dimension-wise derivatives of order $k$ .

Source: Ricky Chen page

Interesting¶

Continuous Normalizing flows¶

FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models - Grathwohl & Chen et. al. (2018) - arxiv

Stochastic Normalizing Flows - Hodgkinson et. al. (2020) - arxiv