Part 2 — Rotations & Orthogonal Mixers
The between-coordinate half of Gaussianization. Part 1’s marginal transforms
fix each coordinate’s distribution but can never remove dependence between
coordinates — a product of 1D maps is separable. An orthogonal rotation
mixes information across dimensions so the next marginal pass has something to
do; iterating the two is RBIG (Part 3). This part builds every mixer that
matters — fitting one, learning one, freezing one, and the cheap structured
linear layers that make deep flows trainable — grounded in
rbig and
gauss_flows.
Each notebook keeps the Part 0/1 pattern: derive the idea, then confirm it against the packages.
Notebooks¶
| # | notebook | master list | what you take away |
|---|---|---|---|
| 00 | Rotation zoo & why it matters | 2.1–2.2 | PCA/ICA/random/Picard; the demo that marginal-only stalls and a rotation unsticks it |
| 01 | Householder & trainable orthogonals | 2.3–2.4 | reflections → products spanning ; Cayley/matrix-exp for ; log-det 0 under training |
| 02 | Fixed orthogonal & PCA warm starts | 2.5–2.6 | FixedRotation.from_data; why freezing matters; Householder decomposition to warm-start a trainable stack |
| 03 | Structured linear layers: 1×1 conv & ActNorm | 2.7–2.8 | LU-parameterised conv ($\log |
The through-line: log-determinants of linear layers¶
Part 2 is organised by what a linear layer costs the log-likelihood:
- Rotation (, orthogonal) — volume-preserving, . Free.
- conv (, general linear) — rescales, , read off the LU factor in .
- ActNorm (, diagonal affine) — , with a data-dependent initialisation.
Each is kept a valid bijector by construction — orthogonal parameterisation, LU factorisation, positive scale — so none can drift singular under training, the lesson of notebook 02 §2.
Threads from earlier parts¶
- The “rotations are free” fact (Part 0, 01) is realised here: every orthogonal mixer contributes log-det 0, and we watch it stay exactly 0 throughout training (notebook 01).
- Marginal transforms (Part 1) are the other half: notebook 00 shows that marginal-only Gaussianization plateaus at non-zero total correlation, and a rotation between passes drives it to 0.
- The warm-start recipe (notebook 02) — decompose → initialise → fine-tune — is the bridge to Part 3’s RBIG-initialised parametric flows.
A note on the packages¶
gauss_flows covers the whole toolkit: HouseholderRotation,
OrthogonalRotation (Cayley), FixedRotation.from_data, Invertible1x1Conv
(LU), ActNorm/ActNorm1D; rbig supplies the rotation zoo (PCARotation,
ICARotation, RandomRotation, PicardRotation). One small asymmetry surfaced
while writing notebook 03 — FixedRotation has a from_data factory but
ActNorm does not, despite data-dependent init being ActNorm’s defining feature
(gauss_flows#112); the
notebook shows the three-line manual init in the meantime.
Running¶
Same uv environment as Part 0 /
Part 1 (rbig + gauss_flows +
a Jupyter stack):
cd projects/gaussianization
.venv-tutorials/bin/jupyter nbconvert --to notebook --execute --inplace \
notebooks/02_rotations/0*.ipynb --ExecutePreprocessor.timeout=600Notebooks are paired (jupytext, py:percent) and set jax_enable_x64.