Normalizing Flows Literature¶
Training¶
- Can use Reverse-KL
- Can use Forward-KL (aka Maximum Likelihood Estimation)
- Generally possible to sample from the model
Maximum Likelihood Training
Stochastic Gradients
- Stochastic
- Scales to Large Datasets
- Converges to True minimum
- Large body of supportive software
Summarizing¶
Almost all papers are trying to do some form of creating a clever jacobian so that it is relatively cheap to calculate and work with.
I like the slides in this presentation which attempts to summarize the different methods and how they are related.
| Jacobian Type | Methods | 
|---|---|
| Determinant Identities | Planar NF, Sylvester NF | 
| Coupling Blocks | NICE, Real NVP, GLOW | 
| AutoRegressive | Inverse AF, Neural AF, Masked AF | 
| Unbiased Estimation | FFJORD, Residual Flows | 
| Diagonal | Gaussianization Flows, GDN | 
Automatic Differentiation¶
According to a talk by Ricky Chen:
For a full Jacobian, need d separate passes. In general, a Jacobian diagonal has the same cost as the full jacobian.
Not sure I understand this. But apparently, one could use HollowNets to efficiently compute dimension-wise derivatives of order k.
Source: Ricky Chen page
Interesting¶
Continuous Normalizing flows¶
FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models - Grathwohl & Chen et. al. (2018) - arxiv
Stochastic Normalizing Flows - Hodgkinson et. al. (2020) - arxiv