MULTISCALE_GRAPH

Overview

The MULTISCALE_GRAPH function computes Multiscale Graph Correlation (MGC), a statistical measure for detecting dependence between two datasets, including nonlinear and high-dimensional relationships that simpler correlation metrics can miss. It reports both a normalized test statistic and a permutation-based p-value, so it supports both effect-size interpretation and hypothesis testing for independence.

At a high level, MGC builds distance matrices for each dataset, evaluates local correlations across many neighborhood scales, and selects a smoothed optimal signal. A common formulation is:

c_{k\ell} = \frac{\sum_{i,j} A_{ij} G_k(i,j)\, B_{ij} H_\ell(i,j)}{\sqrt{\sum_{i,j} A_{ij}^2 G_k(i,j)}\sqrt{\sum_{i,j} B_{ij}^2 H_\ell(i,j)}}

for local scale pair (k,\ell), where A and B are centered distance matrices and G_k, H_\ell are k- and \ell-nearest-neighbor graph masks. The overall MGC statistic is the maximum smoothed local correlation,

\mathrm{MGC}_n(x,y)=\max_{k,\ell} \mathcal{R}\!\left(c_{k\ell}(x,y)\right),

with p-value estimated by permutation under the null hypothesis of independence.

This implementation wraps SciPy’s scipy.stats.multiscale_graphcorr. Key controls are reps=1000 for permutation count, workers=1 for parallel execution, is_twosamp=False for independence-vs-two-sample mode, and random_state=None for reproducibility settings.

MGC is widely used in exploratory and confirmatory analysis where relationships may be complex, such as neuroscience, genomics, and multimodal machine learning. It is especially useful when linear assumptions are too restrictive, because it can capture structure across multiple scales while still returning a familiar significance test.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=MULTISCALE_GRAPH(x, y, reps, workers, is_twosamp, random_state)

x (list[list], required): 2D array of data.
y (list[list], required): 2D array of data.
reps (int, optional, default: 1000): Number of permutations for the null distribution.
workers (int, optional, default: 1): Number of parallel workers to use.
is_twosamp (bool, optional, default: false): If True, performs a two-sample test.
random_state (int, optional, default: null): Seed for random number generator.

Returns (dict): Test statistic (Double). P-value is available as a property.

Example 1: Linear relationship

Inputs:

x	y	random_state
1	1	42
2	2
3	3
4	4
5	5

Excel formula:

=MULTISCALE_GRAPH({1;2;3;4;5}, {1;2;3;4;5}, 42)

Expected output:

{"type":"Double","basicValue":1,"properties":{"Statistic":{"type":"Double","basicValue":1},"P-Value":{"type":"Double","basicValue":0.041958}}}

Example 2: Independent data

Inputs:

x	y	random_state
1	5	42
2	4
3	1
4	2
5	3

Excel formula:

=MULTISCALE_GRAPH({1;2;3;4;5}, {5;4;1;2;3}, 42)

Expected output:

{"type":"Double","basicValue":0.42268,"properties":{"Statistic":{"type":"Double","basicValue":0.42268},"P-Value":{"type":"Double","basicValue":0.246753}}}

Example 3: Nonlinear monotonic relationship

Inputs:

x	y	random_state
1	1	7
2	4
3	9
4	16
5	25

Excel formula:

=MULTISCALE_GRAPH({1;2;3;4;5}, {1;4;9;16;25}, 7)

Expected output:

{"type":"Double","basicValue":0.971092,"properties":{"Statistic":{"type":"Double","basicValue":0.971092},"P-Value":{"type":"Double","basicValue":0.04995}}}

Example 4: Linear relationship with offset

Inputs:

x	y	random_state
2	5	21
4	7
6	9
8	11
10	13

Excel formula:

=MULTISCALE_GRAPH({2;4;6;8;10}, {5;7;9;11;13}, 21)

Expected output:

{"type":"Double","basicValue":1,"properties":{"Statistic":{"type":"Double","basicValue":1},"P-Value":{"type":"Double","basicValue":0.047952}}}

Python Code

Show Code

import numpy as np
from scipy.stats import multiscale_graphcorr as scipy_mgc

def multiscale_graph(x, y, reps=1000, workers=1, is_twosamp=False, random_state=None):
    """
    Compute the Multiscale Graph Correlation (MGC) test statistic and p-value.

    See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.multiscale_graphcorr.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        x (list[list]): 2D array of data.
        y (list[list]): 2D array of data.
        reps (int, optional): Number of permutations for the null distribution. Default is 1000.
        workers (int, optional): Number of parallel workers to use. Default is 1.
        is_twosamp (bool, optional): If True, performs a two-sample test. Default is False.
        random_state (int, optional): Seed for random number generator. Default is None.

    Returns:
        dict: Test statistic (Double). P-value is available as a property.
    """
    try:
        def to2d(val):
            return [[val]] if not isinstance(val, list) else val

        x = to2d(x)
        y = to2d(y)

        if not isinstance(x, list) or not isinstance(y, list):
            return "Error: x and y must be 2D lists."

        if len(x) < 5 or len(y) < 5:
            return "Error: x and y must have at least 5 rows."

        try:
            x_arr = np.array(x, dtype=float)
            y_arr = np.array(y, dtype=float)
        except Exception:
             return "Error: x and y must be numeric."

        if x_arr.ndim == 1:
            x_arr = x_arr.reshape(-1, 1)
        if y_arr.ndim == 1:
            y_arr = y_arr.reshape(-1, 1)

        if x_arr.shape[0] != y_arr.shape[0]:
            return "Error: x and y must have the same number of rows."

        res = scipy_mgc(
            x_arr, y_arr,
            reps=int(reps),
            workers=int(workers),
            is_twosamp=bool(is_twosamp),
            random_state=None if random_state is None else int(random_state)
        )

        statistic = float(res.statistic)
        pvalue = float(res.pvalue)

        if np.isnan(statistic) or np.isnan(pvalue):
            return "Error: Result contains NaN."

        return {
            "type": "Double",
            "basicValue": statistic,
            "properties": {
                "Statistic": {
                    "type": "Double",
                    "basicValue": statistic
                },
                "P-Value": {
                    "type": "Double",
                    "basicValue": pvalue
                }
            }
        }

    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

x *

2D array of data.

y *

2D array of data.

reps

Number of permutations for the null distribution.

workers

Number of parallel workers to use.

is_twosamp

If True, performs a two-sample test.

random_state

Seed for random number generator.