Measuring Dependence: Multivariate (2D) Variables¶

When both $X$ and $Y$ are vectors (not scalars), measuring dependence becomes harder. Kernel methods like CKA are popular but bandwidth-sensitive. This notebook shows that RBIG-based MI provides a non-parametric alternative that captures multivariate nonlinear dependence.

Colab / fresh environment? Run the cell below to install rbig from GitHub. Skip if already installed.

In [1]:

Copied!

!pip install "rbig[all] @ git+https://github.com/jejjohnson/rbig.git" -q
!pip install "rbig[all] @ git+https://github.com/jejjohnson/rbig.git" -q

In [2]:

Copied!

import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

from rbig import AnnealedRBIG, mutual_information_rbig
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

from rbig import AnnealedRBIG, mutual_information_rbig

/anaconda/lib/python3.13/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Dataset 1: Asymmetric nonlinearity¶

$$ \begin{aligned} y_0 &= (2 x_0)^2 + \varepsilon \\ y_1 &= 0.2 x_0 + \sqrt{2|x_1|} + 0.1 \varepsilon \end{aligned} $$

Cross-dimensional dependencies with different nonlinear forms — $y_0$ depends only on $x_0$ (quadratic), while $y_1$ depends on both $x_0$ (linear) and $x_1$ (square-root).

In [3]:

Copied!





rng = np.random.RandomState(42)
N = 1000

x1 = rng.randn(N, 2)
noise1 = rng.randn(N, 1)
y1 = np.column_stack([
    (2 * x1[:, 0]) ** 2 + noise1.ravel(),
    0.2 * x1[:, 0] + np.sqrt(2 * np.abs(x1[:, 1])) + 0.1 * noise1.ravel(),
])
rng = np.random.RandomState(42)
N = 1000

x1 = rng.randn(N, 2)
noise1 = rng.randn(N, 1)
y1 = np.column_stack([
    (2 * x1[:, 0]) ** 2 + noise1.ravel(),
    0.2 * x1[:, 0] + np.sqrt(2 * np.abs(x1[:, 1])) + 0.1 * noise1.ravel(),
])

Scatter plots: all pairwise relationships¶

In [4]:

Copied!





fig, axes = plt.subplots(2, 2, figsize=(8, 7))
labels = [("$x_0$", "$y_0$"), ("$x_1$", "$y_0$"), ("$x_0$", "$y_1$"), ("$x_1$", "$y_1$")]
pairs = [(0, 0), (1, 0), (0, 1), (1, 1)]
for ax, (xi, yi), (xl, yl) in zip(axes.ravel(), pairs, labels):
    ax.scatter(x1[:, xi], y1[:, yi], alpha=0.5, s=10)
    ax.set(xlabel=xl, ylabel=yl)
fig.suptitle("Dataset 1: Asymmetric nonlinearity", fontsize=13)
fig.tight_layout()
plt.show()
fig, axes = plt.subplots(2, 2, figsize=(8, 7))
labels = [("$x_0$", "$y_0$"), ("$x_1$", "$y_0$"), ("$x_0$", "$y_1$"), ("$x_1$", "$y_1$")]
pairs = [(0, 0), (1, 0), (0, 1), (1, 1)]
for ax, (xi, yi), (xl, yl) in zip(axes.ravel(), pairs, labels):
    ax.scatter(x1[:, xi], y1[:, yi], alpha=0.5, s=10)
    ax.set(xlabel=xl, ylabel=yl)
fig.suptitle("Dataset 1: Asymmetric nonlinearity", fontsize=13)
fig.tight_layout()
plt.show()

No description has been provided for this image

Classical measures¶

In [5]:

Copied!





from sklearn.metrics.pairwise import rbf_kernel


def hsic(K, L):
    """Biased HSIC estimator from centered kernel matrices."""
    n = K.shape[0]
    H = np.eye(n) - np.ones((n, n)) / n  # centering matrix
    Kc = H @ K @ H
    Lc = H @ L @ H
    return np.trace(Kc @ Lc) / (n - 1) ** 2


def normalized_hsic(K, L):
    """CKA: HSIC(K,L) / sqrt(HSIC(K,K) * HSIC(L,L)), bounded [0, 1]."""
    return hsic(K, L) / np.sqrt(hsic(K, K) * hsic(L, L))


# --- Linear kernels ---
K_lin1 = x1 @ x1.T
L_lin1 = y1 @ y1.T
cka_lin1 = normalized_hsic(K_lin1, L_lin1)

# --- RBF kernels (median heuristic for length scale) ---
from scipy.spatial.distance import pdist

sigma_x1 = np.median(pdist(x1, "euclidean"))
sigma_y1 = np.median(pdist(y1, "euclidean"))
K_rbf1 = rbf_kernel(x1, gamma=1.0 / (2 * sigma_x1**2))
L_rbf1 = rbf_kernel(y1, gamma=1.0 / (2 * sigma_y1**2))
cka_rbf1 = normalized_hsic(K_rbf1, L_rbf1)

# Spearman on stacked [X, Y]
stacked = np.hstack([x1, y1])
spearman_matrix = stats.spearmanr(stacked).statistic
# Extract cross-correlation block (X cols vs Y cols)
spearman_xy = spearman_matrix[:2, 2:]
spearman_fro = np.linalg.norm(spearman_xy, "fro")

print("Dataset 1 — classical measures:")
print(f"  CKA linear:  {cka_lin1:.4f}")
print(f"  CKA RBF:     {cka_rbf1:.4f}")
print(f"  Spearman cross-block Frobenius: {spearman_fro:.4f}")
from sklearn.metrics.pairwise import rbf_kernel


def hsic(K, L):
    """Biased HSIC estimator from centered kernel matrices."""
    n = K.shape[0]
    H = np.eye(n) - np.ones((n, n)) / n  # centering matrix
    Kc = H @ K @ H
    Lc = H @ L @ H
    return np.trace(Kc @ Lc) / (n - 1) ** 2


def normalized_hsic(K, L):
    """CKA: HSIC(K,L) / sqrt(HSIC(K,K) * HSIC(L,L)), bounded [0, 1]."""
    return hsic(K, L) / np.sqrt(hsic(K, K) * hsic(L, L))


# --- Linear kernels ---
K_lin1 = x1 @ x1.T
L_lin1 = y1 @ y1.T
cka_lin1 = normalized_hsic(K_lin1, L_lin1)

# --- RBF kernels (median heuristic for length scale) ---
from scipy.spatial.distance import pdist

sigma_x1 = np.median(pdist(x1, "euclidean"))
sigma_y1 = np.median(pdist(y1, "euclidean"))
K_rbf1 = rbf_kernel(x1, gamma=1.0 / (2 * sigma_x1**2))
L_rbf1 = rbf_kernel(y1, gamma=1.0 / (2 * sigma_y1**2))
cka_rbf1 = normalized_hsic(K_rbf1, L_rbf1)

# Spearman on stacked [X, Y]
stacked = np.hstack([x1, y1])
spearman_matrix = stats.spearmanr(stacked).statistic
# Extract cross-correlation block (X cols vs Y cols)
spearman_xy = spearman_matrix[:2, 2:]
spearman_fro = np.linalg.norm(spearman_xy, "fro")

print("Dataset 1 — classical measures:")
print(f"  CKA linear:  {cka_lin1:.4f}")
print(f"  CKA RBF:     {cka_rbf1:.4f}")
print(f"  Spearman cross-block Frobenius: {spearman_fro:.4f}")

Dataset 1 — classical measures:
  CKA linear:  0.0018
  CKA RBF:     0.1488
  Spearman cross-block Frobenius: 0.3328

Mutual Information via RBIG¶

In [6]:

Copied!





model_x1 = AnnealedRBIG(n_layers=50, rotation="pca", random_state=42)
model_y1 = AnnealedRBIG(n_layers=50, rotation="pca", random_state=42)
model_xy1 = AnnealedRBIG(n_layers=50, rotation="pca", random_state=42)

model_x1.fit(x1)
model_y1.fit(y1)
model_xy1.fit(np.hstack([x1, y1]))

mi1 = mutual_information_rbig(model_x1, model_y1, model_xy1)
icc1 = np.sqrt(np.maximum(0, 1 - np.exp(-2 * mi1)))

print(f"  MI (RBIG): {mi1:.4f} nats")
print(f"  ICC:       {icc1:.4f}")
model_x1 = AnnealedRBIG(n_layers=50, rotation="pca", random_state=42)
model_y1 = AnnealedRBIG(n_layers=50, rotation="pca", random_state=42)
model_xy1 = AnnealedRBIG(n_layers=50, rotation="pca", random_state=42)

model_x1.fit(x1)
model_y1.fit(y1)
model_xy1.fit(np.hstack([x1, y1]))

mi1 = mutual_information_rbig(model_x1, model_y1, model_xy1)
icc1 = np.sqrt(np.maximum(0, 1 - np.exp(-2 * mi1)))

print(f"  MI (RBIG): {mi1:.4f} nats")
print(f"  ICC:       {icc1:.4f}")

  MI (RBIG): 3.0600 nats
  ICC:       0.9989

Dataset 2: Symmetric quadratic (higher noise)¶

$$ \begin{aligned} y_0 &= (2 x_0)^2 + 3\varepsilon_0 \\ y_1 &= (2 x_1)^2 + 3\varepsilon_1 \end{aligned} $$

Independent per-dimension quadratics with higher noise — weaker signal.

In [7]:

Copied!





x2 = rng.randn(N, 2)
y2 = np.column_stack([
    (2 * x2[:, 0]) ** 2 + 3 * rng.randn(N),
    (2 * x2[:, 1]) ** 2 + 3 * rng.randn(N),
])

fig, axes = plt.subplots(2, 2, figsize=(8, 7))
for ax, (xi, yi), (xl, yl) in zip(axes.ravel(), pairs, labels):
    ax.scatter(x2[:, xi], y2[:, yi], alpha=0.5, s=10)
    ax.set(xlabel=xl, ylabel=yl)
fig.suptitle("Dataset 2: Symmetric quadratic (high noise)", fontsize=13)
fig.tight_layout()
plt.show()
x2 = rng.randn(N, 2)
y2 = np.column_stack([
    (2 * x2[:, 0]) ** 2 + 3 * rng.randn(N),
    (2 * x2[:, 1]) ** 2 + 3 * rng.randn(N),
])

fig, axes = plt.subplots(2, 2, figsize=(8, 7))
for ax, (xi, yi), (xl, yl) in zip(axes.ravel(), pairs, labels):
    ax.scatter(x2[:, xi], y2[:, yi], alpha=0.5, s=10)
    ax.set(xlabel=xl, ylabel=yl)
fig.suptitle("Dataset 2: Symmetric quadratic (high noise)", fontsize=13)
fig.tight_layout()
plt.show()

Classical measures + MI¶

In [8]:

Copied!





# Linear CKA
K_lin2 = x2 @ x2.T
L_lin2 = y2 @ y2.T
cka_lin2 = normalized_hsic(K_lin2, L_lin2)

# RBF CKA (median heuristic)
sigma_x2 = np.median(pdist(x2, "euclidean"))
sigma_y2 = np.median(pdist(y2, "euclidean"))
K_rbf2 = rbf_kernel(x2, gamma=1.0 / (2 * sigma_x2**2))
L_rbf2 = rbf_kernel(y2, gamma=1.0 / (2 * sigma_y2**2))
cka_rbf2 = normalized_hsic(K_rbf2, L_rbf2)

spearman2 = stats.spearmanr(np.hstack([x2, y2])).statistic
spearman_xy2 = spearman2[:2, 2:]
spearman_fro2 = np.linalg.norm(spearman_xy2, "fro")

model_x2 = AnnealedRBIG(n_layers=50, rotation="pca", random_state=42)
model_y2 = AnnealedRBIG(n_layers=50, rotation="pca", random_state=42)
model_xy2 = AnnealedRBIG(n_layers=50, rotation="pca", random_state=42)
model_x2.fit(x2)
model_y2.fit(y2)
model_xy2.fit(np.hstack([x2, y2]))

mi2 = mutual_information_rbig(model_x2, model_y2, model_xy2)
icc2 = np.sqrt(np.maximum(0, 1 - np.exp(-2 * mi2)))

print("Dataset 2 — classical measures:")
print(f"  CKA linear:  {cka_lin2:.4f}")
print(f"  CKA RBF:     {cka_rbf2:.4f}")
print(f"  Spearman cross-block Frobenius: {spearman_fro2:.4f}")
print(f"  MI (RBIG): {mi2:.4f} nats")
print(f"  ICC:       {icc2:.4f}")
# Linear CKA
K_lin2 = x2 @ x2.T
L_lin2 = y2 @ y2.T
cka_lin2 = normalized_hsic(K_lin2, L_lin2)

# RBF CKA (median heuristic)
sigma_x2 = np.median(pdist(x2, "euclidean"))
sigma_y2 = np.median(pdist(y2, "euclidean"))
K_rbf2 = rbf_kernel(x2, gamma=1.0 / (2 * sigma_x2**2))
L_rbf2 = rbf_kernel(y2, gamma=1.0 / (2 * sigma_y2**2))
cka_rbf2 = normalized_hsic(K_rbf2, L_rbf2)

spearman2 = stats.spearmanr(np.hstack([x2, y2])).statistic
spearman_xy2 = spearman2[:2, 2:]
spearman_fro2 = np.linalg.norm(spearman_xy2, "fro")

model_x2 = AnnealedRBIG(n_layers=50, rotation="pca", random_state=42)
model_y2 = AnnealedRBIG(n_layers=50, rotation="pca", random_state=42)
model_xy2 = AnnealedRBIG(n_layers=50, rotation="pca", random_state=42)
model_x2.fit(x2)
model_y2.fit(y2)
model_xy2.fit(np.hstack([x2, y2]))

mi2 = mutual_information_rbig(model_x2, model_y2, model_xy2)
icc2 = np.sqrt(np.maximum(0, 1 - np.exp(-2 * mi2)))

print("Dataset 2 — classical measures:")
print(f"  CKA linear:  {cka_lin2:.4f}")
print(f"  CKA RBF:     {cka_rbf2:.4f}")
print(f"  Spearman cross-block Frobenius: {spearman_fro2:.4f}")
print(f"  MI (RBIG): {mi2:.4f} nats")
print(f"  ICC:       {icc2:.4f}")

Dataset 2 — classical measures:
  CKA linear:  0.0121
  CKA RBF:     0.1946
  Spearman cross-block Frobenius: 0.0551
  MI (RBIG): 1.5795 nats
  ICC:       0.9785

Comparison summary¶

Metric	Dataset 1 (asymmetric)	Dataset 2 (symmetric, noisy)
CKA linear	low	low
CKA RBF	moderate–high	moderate
Spearman cross-Frobenius	low	low
MI (RBIG)	high	moderate
ICC	high	moderate

The RBF kernel captures nonlinear dependence that the linear kernel misses entirely — similar to how MI outperforms Pearson/Spearman. However, CKA RBF depends on the bandwidth choice, while MI (RBIG) is non-parametric.

Again, MI detects nonlinear multivariate dependence that classical matrix-based measures largely miss. Dataset 2 shows lower MI due to the higher noise level, correctly reflecting the weaker signal.