Fair learning with frozen Gaussianization flows

This sub-project replaces the CKA fairness penalty in keras-fairkl with a family of penalties built from a frozen Gaussianization flow. The flow is trained once, frozen, and reused as a differentiable Gaussian-space probe inside any downstream predictor’s optimisation loop.

The one-paragraph pitch¶

A Gaussianization flow $T : \mathbb{R}^d \to \mathbb{R}^d$ turns arbitrary marginals into approximately standard normals while preserving all statistical dependence ( $T(Z) \perp T(Q) \iff Z \perp Q$ ). Once trained and frozen, $T$ acts as a fixed, scale-normalising, differentiable preprocessor — it absorbs the kernel/bandwidth choices of CKA and HSIC into its mixture-CDF parameters, and turns “measure non-linear dependence between $z$ and $q$ ” into “measure linear dependence between near-Gaussian variables.” Three concrete penalties exploit this: G-XCOV, G-MI, and G-TC.

Reading order¶

¶	Page	What it is for
1	Fair learning with frozen Gaussianization flows — design doc	Design doc. Mental model, math, hypotheses, experiment plan, risks, milestones. Read first.
2	Pretrain & freeze a Gaussianization flow	Notebook 05. Pretrain + freeze a flow on a 2-D dataset; four diagnostics that prove the flow Gaussianises, freezes, and inverts.
3	Fair MLP regression with a frozen Gaussianization flow	Notebook 06. Fair MLP regression on synthetic data; Pareto curve of $(\text{RMSE}, \|\mathrm{corr}(\hat y, q)\|)$ across G-XCOV, G-MI, G-TC, and CKA.
4	UCI Adult Census — real-data fair classification	Notebook 07. Same setup on UCI Adult Census; Pareto curves on AUC vs. DP-diff and EO-diff.
5	Fair Gaussianization — input-side follow-up experiments	Follow-up doc. Seven input-side alternatives that move the flow from the predictor’s output to its input / representation / data pipeline.

The three penalties at a glance¶

Table (2):Output-side fairness penalties built on a frozen Gaussianization flow.

Loss	Captures	Closed form?	Joint flow needed?
G-XCOV	2nd-moment dependence in Gaussianised space (linear CKA)	yes	no — two marginal flows
G-MI	MI assuming joint-Gaussian after Gaussianisation	yes	no — two marginal flows
G-TC	Full MI / total correlation, no joint-Gaussian assumption	no — via flow NLL	yes — one joint flow over $(z, q)$

All three are differentiable in the predictor’s parameters and plug into FairModelWrapper via its fairness_loss=... argument. See §4 of the design doc for the math, and §4.4 — the comparison table for the property comparison.

Status¶

	Milestone	Acceptance
✅	Skeleton: `fair/{losses,freeze,pretrain,metrics}.py` + tests	`pytest tests/test_fair.py` green
✅	Notebook 05: pretrain + freeze + 4 diagnostics	Executed and committed
✅	Notebook 06: synthetic Pareto with G-XCOV vs CKA	Pareto curve from RMSE 0.11 → 1.35
✅	Notebook 07: Adult Pareto with G-XCOV vs CKA	Pareto traced
🟡	G-MI + G-TC losses + tests	In flight
🟡	Notebooks 06/07 re-executed with G-MI and G-TC curves	Pending
⏳	H3 quadratic-dependence experiment (`08_quadratic_dependence.ipynb`)	Pending
⏳	Input-side follow-ups (see Fair Gaussianization — input-side follow-up experiments)	Pending