Dimensionality-Reducing Rotations¶
Sometimes you want fewer output dimensions than input dimensions. These rotations project data from $D$ dimensions to $K < D$, trading bijectivity for computational savings and feature extraction. This notebook compares four approaches and their tradeoffs.
Colab / fresh environment? Run the cell below to install
rbigfrom GitHub. Skip if already installed.
!pip install "rbig[all] @ git+https://github.com/jejjohnson/rbig.git" -q
Theory¶
Bijectivity and the Jacobian¶
A normalizing flow requires bijective (invertible) transforms so that the change-of-variables formula holds:
$$\log p(x) = \log p_z(f(x)) + \log|\det J_f(x)|$$
When K = D, the rotation is square and orthogonal, so $|\det J| = 1$ and the transform is exactly invertible.
When K < D, the transform is a projection — it discards D − K dimensions of information. This means:
- The transform is not bijective (you cannot recover the original data)
- The Jacobian determinant is undefined (the Jacobian is not square)
- Density estimation via change-of-variables is invalid
When Is Dim-Reduction Useful?¶
Even though we lose bijectivity, dimensionality reduction is valuable for:
- Exploratory analysis — visualize or summarize high-dimensional data
- Feature extraction — reduce dimensionality before a downstream task
- Computational savings — RBIG on K << D features is much faster
- Approximate density estimation — if most variance is captured in K components, the density estimate may still be useful (with caveats)
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from rbig import (
PCARotation,
RandomOrthogonalProjection,
GaussianRandomProjection,
OrthogonalDimensionalityReduction,
)
/anaconda/lib/python3.13/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
Available Methods¶
| Class | Method | Orthogonal? | Inverse (K<D)? | log|det J| (K<D)? |
|---|---|---|---|---|
PCARotation |
PCA eigenvectors | Yes | Lossy reconstruction | Undefined |
OrthogonalDimensionalityReduction |
Haar rotation + truncation | Yes | Not supported | Not supported |
RandomOrthogonalProjection |
QR of random matrix | Yes (columns) | Not supported | Not supported |
GaussianRandomProjection |
Gaussian random matrix | No | Pseudoinverse (approx) | Returns 0 (approx) |
Data¶
We create a 10-D dataset with most variance concentrated in 3 directions, mimicking a common real-world scenario where data lives on a low-dimensional manifold.
rng = np.random.default_rng(42)
N = 1_000
D = 10
K = 3 # target dimensionality
# 3 strong signal directions + 7 noise directions
signal = rng.standard_normal((N, 3)) * np.array([5.0, 3.0, 1.5])
noise = 0.1 * rng.standard_normal((N, 7))
data_full = np.hstack([signal, noise])
# Apply a random rotation so the structure isn't axis-aligned
Q, _ = np.linalg.qr(rng.standard_normal((D, D)))
X = data_full @ Q.T
print(f"Data shape: {X.shape}")
print(f"Singular values: {np.linalg.svd(X, compute_uv=False)[:6].round(1)}")
Data shape: (1000, 10) Singular values: [160.3 94.8 47.8 3.4 3.3 3.2]
The first 3 singular values are much larger than the rest — the data is effectively 3-dimensional embedded in 10-D space.
PCA with Truncation¶
PCARotation(n_components=K) retains only the top-K principal components.
The inverse maps back to D dimensions but loses the discarded components.
pca = PCARotation(n_components=K, whiten=True)
pca.fit(X)
Z_pca = pca.transform(X)
print(f"Input shape: {X.shape}")
print(f"Output shape: {Z_pca.shape}")
# Lossy reconstruction
X_rec_pca = pca.inverse_transform(Z_pca)
rec_err = np.mean((X - X_rec_pca) ** 2)
print(f"Reconstruction MSE: {rec_err:.4f}")
Input shape: (1000, 10) Output shape: (1000, 3) Reconstruction MSE: 0.0070
# Explained variance
var_explained = pca.pca_.explained_variance_ratio_
print("Explained variance per component:")
for i, v in enumerate(var_explained):
bar = "█" * int(v * 50)
print(f" PC {i}: {v:.3f} {bar}")
print(f" Total: {var_explained.sum():.3f}")
Explained variance per component: PC 0: 0.694 ██████████████████████████████████ PC 1: 0.243 ████████████ PC 2: 0.062 ███ Total: 0.998
OrthogonalDimensionalityReduction¶
Applies a full D×D Haar-random orthogonal rotation, then truncates to the first K dimensions. Unlike PCA, the rotation is random (not data-dependent), so it doesn't preferentially capture high-variance directions.
orth_dr = OrthogonalDimensionalityReduction(n_components=K, random_state=42)
orth_dr.fit(X)
Z_orth = orth_dr.transform(X)
print(f"Input shape: {X.shape}")
print(f"Output shape: {Z_orth.shape}")
# Inverse is not supported for K < D
try:
orth_dr.inverse_transform(Z_orth)
except NotImplementedError as e:
print(f"Inverse: {e}")
Input shape: (1000, 10) Output shape: (1000, 3) Inverse: OrthogonalDimensionalityReduction with n_components < input dimension is not bijective; inverse_transform is undefined.
RandomOrthogonalProjection¶
Generates a semi-orthogonal D×K matrix (orthonormal columns) via QR decomposition. Projects data onto a random K-dimensional subspace.
rand_orth = RandomOrthogonalProjection(n_components=K, random_state=42)
rand_orth.fit(X)
Z_rand_orth = rand_orth.transform(X)
print(f"Input shape: {X.shape}")
print(f"Output shape: {Z_rand_orth.shape}")
print(f"Projection matrix shape: {rand_orth.projection_matrix_.shape}")
# Verify orthonormality of columns
P = rand_orth.projection_matrix_
print(f"P^T P ≈ I_K: {np.allclose(P.T @ P, np.eye(K), atol=1e-10)}")
Input shape: (1000, 10) Output shape: (1000, 3) Projection matrix shape: (10, 3) P^T P ≈ I_K: True
GaussianRandomProjection¶
Uses a random Gaussian matrix (entries ~ N(0, 1/K)) for Johnson-Lindenstrauss style projection. Columns are not orthogonal, but pairwise distances are approximately preserved.
This is the only dim-reducing rotation that provides an approximate inverse (via pseudoinverse) and an approximate log-det-Jacobian (returns 0).
gauss_proj = GaussianRandomProjection(n_components=K, random_state=42)
gauss_proj.fit(X)
Z_gauss = gauss_proj.transform(X)
print(f"Input shape: {X.shape}")
print(f"Output shape: {Z_gauss.shape}")
# Approximate inverse via pseudoinverse
X_rec_gauss = gauss_proj.inverse_transform(Z_gauss)
rec_err_gauss = np.mean((X - X_rec_gauss) ** 2)
print(f"Reconstruction MSE (pseudoinverse): {rec_err_gauss:.4f}")
Input shape: (1000, 10) Output shape: (1000, 3) Reconstruction MSE (pseudoinverse): 2.7488
total_var = np.var(X, axis=0).sum()
projections = {
"PCA (K=3)": Z_pca,
"OrthogonalDimRed": Z_orth,
"RandomOrthProj": Z_rand_orth,
"GaussianRandProj": Z_gauss,
}
print(f"{'Method':<22s} {'Var retained':>14s} {'% of total':>12s}")
print("-" * 50)
for name, Z in projections.items():
var_retained = np.var(Z, axis=0).sum()
pct = 100 * var_retained / total_var
print(f"{name:<22s} {var_retained:>14.2f} {pct:>11.1f}%")
Method Var retained % of total -------------------------------------------------- PCA (K=3) 3.00 8.1% OrthogonalDimRed 15.89 42.9% RandomOrthProj 9.53 25.7% GaussianRandProj 13.51 36.5%
PCA retains the most variance by design — it picks the directions of maximum variance. Random projections retain less because they don't know which directions matter.
Pairwise Distance Preservation¶
The Johnson-Lindenstrauss lemma guarantees that random projections approximately preserve pairwise distances. Let's verify.
from scipy.spatial.distance import pdist
# Use a subset for speed
idx = rng.choice(N, size=200, replace=False)
d_orig = pdist(X[idx])
fig, axes = plt.subplots(1, 4, figsize=(16, 4))
for ax, (name, Z) in zip(axes, projections.items()):
d_proj = pdist(Z[idx])
ax.scatter(d_orig, d_proj, s=2, alpha=0.3)
ax.plot([0, d_orig.max()], [0, d_orig.max()], "r--", alpha=0.5)
corr = np.corrcoef(d_orig, d_proj)[0, 1]
ax.set_title(f"{name}\nr = {corr:.3f}", fontsize=10)
ax.set_xlabel("Original distances")
ax.set_ylabel("Projected distances")
plt.suptitle("Pairwise distance preservation (K=3, D=10)", y=1.02)
plt.tight_layout()
plt.show()
Reconstruction Error¶
print(f"{'Method':<22s} {'MSE':>10s} {'Max |err|':>10s}")
print("-" * 44)
# PCA: built-in inverse (lossy)
X_rec_pca = pca.inverse_transform(Z_pca)
mse_pca = np.mean((X - X_rec_pca) ** 2)
print(f"{'PCA (K=3)':<22s} {mse_pca:>10.4f} {np.abs(X - X_rec_pca).max():>10.4f}")
# GaussianRandProj: manual pseudoinverse
mse_gauss = np.mean((X - X_rec_gauss) ** 2)
print(f"{'GaussianRandProj':<22s} {mse_gauss:>10.4f} {np.abs(X - X_rec_gauss).max():>10.4f}")
print("\n(OrthogonalDimRed and RandomOrthProj do not support inverse_transform for K < D)")
Method MSE Max |err| -------------------------------------------- PCA (K=3) 0.0070 0.3453 GaussianRandProj 2.7488 8.4987 (OrthogonalDimRed and RandomOrthProj do not support inverse_transform for K < D)
Summary¶
When to Use Each Method¶
| Method | Use when... |
|---|---|
PCARotation(n_components=K) |
You want maximum variance in K dims; reconstruction needed |
OrthogonalDimensionalityReduction |
You want a random but orthogonal projection |
RandomOrthogonalProjection |
You want an orthonormal projection matrix (columns are orthonormal) |
GaussianRandomProjection |
You need approximate inverse or JL distance preservation |
Bijectivity Trade-offs¶
| Property | K = D | K < D |
|---|---|---|
| Bijective | Yes | No |
| Exact inverse | Yes | No (lossy or unsupported) |
| log|det J| | 0 | Undefined (or approximate) |
| Valid for density estimation | Yes | No (approximate at best) |
Rule of thumb: If you need density estimation or generative sampling (the core RBIG use case), use square rotations (K = D). Use dimensionality-reducing rotations for exploration, visualization, or preprocessing before a non-density-estimation task.
See Also¶
- Rotation Choices — comparison of PCA, ICA, and random rotations
- Image Rotations — specialized rotations for image data