MULTISCALE_GRAPH
Overview
The MULTISCALE_GRAPH function computes Multiscale Graph Correlation (MGC), a statistical measure for detecting dependence between two datasets, including nonlinear and high-dimensional relationships that simpler correlation metrics can miss. It reports both a normalized test statistic and a permutation-based p-value, so it supports both effect-size interpretation and hypothesis testing for independence.
At a high level, MGC builds distance matrices for each dataset, evaluates local correlations across many neighborhood scales, and selects a smoothed optimal signal. A common formulation is:
c_{k\ell} = \frac{\sum_{i,j} A_{ij} G_k(i,j)\, B_{ij} H_\ell(i,j)}{\sqrt{\sum_{i,j} A_{ij}^2 G_k(i,j)}\sqrt{\sum_{i,j} B_{ij}^2 H_\ell(i,j)}}
for local scale pair (k,\ell), where A and B are centered distance matrices and G_k, H_\ell are k- and \ell-nearest-neighbor graph masks. The overall MGC statistic is the maximum smoothed local correlation,
\mathrm{MGC}_n(x,y)=\max_{k,\ell} \mathcal{R}\!\left(c_{k\ell}(x,y)\right),
with p-value estimated by permutation under the null hypothesis of independence.
This implementation wraps SciPy’s scipy.stats.multiscale_graphcorr. Key controls are reps=1000 for permutation count, workers=1 for parallel execution, is_twosamp=False for independence-vs-two-sample mode, and random_state=None for reproducibility settings.
MGC is widely used in exploratory and confirmatory analysis where relationships may be complex, such as neuroscience, genomics, and multimodal machine learning. It is especially useful when linear assumptions are too restrictive, because it can capture structure across multiple scales while still returning a familiar significance test.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=MULTISCALE_GRAPH(x, y, reps, workers, is_twosamp, random_state)
x(list[list], required): 2D array of data.y(list[list], required): 2D array of data.reps(int, optional, default: 1000): Number of permutations for the null distribution.workers(int, optional, default: 1): Number of parallel workers to use.is_twosamp(bool, optional, default: false): If True, performs a two-sample test.random_state(int, optional, default: null): Seed for random number generator.
Returns (dict): Test statistic (Double). P-value is available as a property.
Example 1: Linear relationship
Inputs:
| x | y | random_state |
|---|---|---|
| 1 | 1 | 42 |
| 2 | 2 | |
| 3 | 3 | |
| 4 | 4 | |
| 5 | 5 |
Excel formula:
=MULTISCALE_GRAPH({1;2;3;4;5}, {1;2;3;4;5}, 42)
Expected output:
{"type":"Double","basicValue":1,"properties":{"Statistic":{"type":"Double","basicValue":1},"P-Value":{"type":"Double","basicValue":0.041958}}}
Example 2: Independent data
Inputs:
| x | y | random_state |
|---|---|---|
| 1 | 5 | 42 |
| 2 | 4 | |
| 3 | 1 | |
| 4 | 2 | |
| 5 | 3 |
Excel formula:
=MULTISCALE_GRAPH({1;2;3;4;5}, {5;4;1;2;3}, 42)
Expected output:
{"type":"Double","basicValue":0.42268,"properties":{"Statistic":{"type":"Double","basicValue":0.42268},"P-Value":{"type":"Double","basicValue":0.246753}}}
Example 3: Nonlinear monotonic relationship
Inputs:
| x | y | random_state |
|---|---|---|
| 1 | 1 | 7 |
| 2 | 4 | |
| 3 | 9 | |
| 4 | 16 | |
| 5 | 25 |
Excel formula:
=MULTISCALE_GRAPH({1;2;3;4;5}, {1;4;9;16;25}, 7)
Expected output:
{"type":"Double","basicValue":0.971092,"properties":{"Statistic":{"type":"Double","basicValue":0.971092},"P-Value":{"type":"Double","basicValue":0.04995}}}
Example 4: Linear relationship with offset
Inputs:
| x | y | random_state |
|---|---|---|
| 2 | 5 | 21 |
| 4 | 7 | |
| 6 | 9 | |
| 8 | 11 | |
| 10 | 13 |
Excel formula:
=MULTISCALE_GRAPH({2;4;6;8;10}, {5;7;9;11;13}, 21)
Expected output:
{"type":"Double","basicValue":1,"properties":{"Statistic":{"type":"Double","basicValue":1},"P-Value":{"type":"Double","basicValue":0.047952}}}
Python Code
Show Code
import numpy as np
from scipy.stats import multiscale_graphcorr as scipy_mgc
def multiscale_graph(x, y, reps=1000, workers=1, is_twosamp=False, random_state=None):
"""
Compute the Multiscale Graph Correlation (MGC) test statistic and p-value.
See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.multiscale_graphcorr.html
This example function is provided as-is without any representation of accuracy.
Args:
x (list[list]): 2D array of data.
y (list[list]): 2D array of data.
reps (int, optional): Number of permutations for the null distribution. Default is 1000.
workers (int, optional): Number of parallel workers to use. Default is 1.
is_twosamp (bool, optional): If True, performs a two-sample test. Default is False.
random_state (int, optional): Seed for random number generator. Default is None.
Returns:
dict: Test statistic (Double). P-value is available as a property.
"""
try:
def to2d(val):
return [[val]] if not isinstance(val, list) else val
x = to2d(x)
y = to2d(y)
if not isinstance(x, list) or not isinstance(y, list):
return "Error: x and y must be 2D lists."
if len(x) < 5 or len(y) < 5:
return "Error: x and y must have at least 5 rows."
try:
x_arr = np.array(x, dtype=float)
y_arr = np.array(y, dtype=float)
except Exception:
return "Error: x and y must be numeric."
if x_arr.ndim == 1:
x_arr = x_arr.reshape(-1, 1)
if y_arr.ndim == 1:
y_arr = y_arr.reshape(-1, 1)
if x_arr.shape[0] != y_arr.shape[0]:
return "Error: x and y must have the same number of rows."
res = scipy_mgc(
x_arr, y_arr,
reps=int(reps),
workers=int(workers),
is_twosamp=bool(is_twosamp),
random_state=None if random_state is None else int(random_state)
)
statistic = float(res.statistic)
pvalue = float(res.pvalue)
if np.isnan(statistic) or np.isnan(pvalue):
return "Error: Result contains NaN."
return {
"type": "Double",
"basicValue": statistic,
"properties": {
"Statistic": {
"type": "Double",
"basicValue": statistic
},
"P-Value": {
"type": "Double",
"basicValue": pvalue
}
}
}
except Exception as e:
return f"Error: {str(e)}"