MANOVA_TEST

Overview

The MANOVA_TEST function performs Multivariate Analysis of Variance (MANOVA), a statistical procedure for comparing multivariate sample means across two or more groups. MANOVA extends univariate analysis of variance (ANOVA) to situations where there are multiple dependent variables, using the covariance between outcome variables when testing the statistical significance of mean differences.

This implementation uses the statsmodels library’s MANOVA class, which is based on multivariate regression. For more details, see the statsmodels MANOVA documentation. The function tests the null hypothesis that all group mean vectors are equal across the specified dependent variables.

The function returns four commonly used test statistics, each derived from the eigenvalues \lambda_p of the matrix A = S_{\text{model}} S_{\text{res}}^{-1}:

  • Wilks’ lambda: \Lambda_{\text{Wilks}} = \prod (1 + \lambda_p)^{-1} — measures the proportion of variance not explained by group differences
  • Pillai’s trace: \Lambda_{\text{Pillai}} = \sum \frac{\lambda_p}{1 + \lambda_p} — considered the most robust to violations of assumptions
  • Hotelling-Lawley trace: \Lambda_{\text{LH}} = \sum \lambda_p — powerful when group differences are concentrated in one dimension
  • Roy’s greatest root: \Lambda_{\text{Roy}} = \max(\lambda_p) — most powerful when the alternative hypothesis is true for a single linear combination

Each test statistic is converted to an approximate F-statistic with associated degrees of freedom and p-value. The function compares the minimum p-value across all test statistics against the specified significance level (alpha) to determine whether to reject the null hypothesis.

MANOVA is particularly useful in experimental designs where multiple related outcomes are measured simultaneously, as it controls the family-wise error rate better than running separate ANOVAs. For background on multivariate analysis of variance, see the Wikipedia article on MANOVA.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=MANOVA_TEST(data, groups, alpha)
  • data (list[list], required): A matrix of dependent variables where rows are observations and columns are dependent variables.
  • groups (list[list], required): A column vector of group membership indicators (integer coded). Must have the same number of rows as data.
  • alpha (float, optional, default: 0.05): Significance level for hypothesis testing. Must be between 0 and 1 (exclusive).

Returns (list[list]): 2D list with MANOVA results, or error message string.

Example 1: Two groups with two dependent variables

Inputs:

data groups
1 2 1
2 3 1
3 4 1
4 5 2
5 6 2
6 7 2

Excel formula:

=MANOVA_TEST({1,2;2,3;3,4;4,5;5,6;6,7}, {1;1;1;2;2;2})

Expected output:

test_statistic statistic_name statistic_value f_value df_num df_denom p_value
Wilks Wilks’ lambda 0.228571 13.5 1 4 0.0213116
Pillai Pillai’s trace 0.771429 13.5 1 4 0.0213116
Hotelling-Lawley Hotelling-Lawley trace 3.375 13.5 1 4 0.0213116
Roy Roy’s greatest root 3.375 13.5 1 4 0.0213116
reject_null
Example 2: Three groups with two dependent variables

Inputs:

data groups
1 2 1
2 3 1
3 4 1
5 6 2
6 7 2
7 8 2
9 10 3
10 11 3
11 12 3

Excel formula:

=MANOVA_TEST({1,2;2,3;3,4;5,6;6,7;7,8;9,10;10,11;11,12}, {1;1;1;2;2;2;3;3;3})

Expected output:

test_statistic statistic_name statistic_value f_value df_num df_denom p_value
Wilks Wilks’ lambda 0.0588235 112 1 7 0.0000147079
Pillai Pillai’s trace 0.941176 112 1 7 0.0000147079
Hotelling-Lawley Hotelling-Lawley trace 16 112 1 7 0.0000147079
Roy Roy’s greatest root 16 112 1 7 0.0000147079
reject_null
Example 3: Custom alpha value with stricter significance level

Inputs:

data groups alpha
1 2 1 0.01
2 3 1
3 4 1
4 5 2
5 6 2
6 7 2

Excel formula:

=MANOVA_TEST({1,2;2,3;3,4;4,5;5,6;6,7}, {1;1;1;2;2;2}, 0.01)

Expected output:

test_statistic statistic_name statistic_value f_value df_num df_denom p_value
Wilks Wilks’ lambda 0.228571 13.5 1 4 0.0213116
Pillai Pillai’s trace 0.771429 13.5 1 4 0.0213116
Hotelling-Lawley Hotelling-Lawley trace 3.375 13.5 1 4 0.0213116
Roy Roy’s greatest root 3.375 13.5 1 4 0.0213116
fail_to_reject_null
Example 4: Two groups with three dependent variables

Inputs:

data groups
1 2 3 1
2 3 4 1
3 4 5 1
5 6 7 2
6 7 8 2
7 8 9 2

Excel formula:

=MANOVA_TEST({1,2,3;2,3,4;3,4,5;5,6,7;6,7,8;7,8,9}, {1;1;1;2;2;2})

Expected output:

test_statistic statistic_name statistic_value f_value df_num df_denom p_value
Wilks Wilks’ lambda 0.142857 24 1 4 0.00804989
Pillai Pillai’s trace 0.857143 24 1 4 0.00804989
Hotelling-Lawley Hotelling-Lawley trace 6 24 1 4 0.00804989
Roy Roy’s greatest root 6 24 1 4 0.00804989
reject_null

Python Code

Show Code
import pandas as pd
from statsmodels.multivariate.manova import MANOVA as statsmodels_manova

def manova_test(data, groups, alpha=0.05):
    """
    Performs Multivariate Analysis of Variance (MANOVA) for multiple dependent variables.

    See: https://www.statsmodels.org/stable/generated/statsmodels.multivariate.manova.MANOVA.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): A matrix of dependent variables where rows are observations and columns are dependent variables.
        groups (list[list]): A column vector of group membership indicators (integer coded). Must have the same number of rows as data.
        alpha (float, optional): Significance level for hypothesis testing. Must be between 0 and 1 (exclusive). Default is 0.05.

    Returns:
        list[list]: 2D list with MANOVA results, or error message string.
    """
    def to2d(x):
        return [[x]] if not isinstance(x, list) else x

    def validate_float(val, name):
        if not isinstance(val, (int, float)):
            return f"Error: Invalid input: {name} must be a number."
        val = float(val)
        if val != val or val == float('inf') or val == float('-inf'):
            return f"Error: Invalid input: {name} must be finite."
        return val

    try:
        data = to2d(data)
        groups = to2d(groups)

        alpha_val = validate_float(alpha, "alpha")
        if isinstance(alpha_val, str):
            return alpha_val
        if alpha_val <= 0 or alpha_val >= 1:
            return "Error: Invalid input: alpha must be between 0 and 1."

        if not isinstance(data, list) or len(data) == 0:
            return "Error: Invalid input: data must be a non-empty 2D list."

        for i, row in enumerate(data):
            if not isinstance(row, list):
                return f"Error: Invalid input: data row {i} must be a list."
            if len(row) == 0:
                return f"Error: Invalid input: data row {i} must be non-empty."

        n_obs = len(data)
        n_vars = len(data[0])

        for row in data:
            if len(row) != n_vars:
                return "Error: Invalid input: all rows in data must have the same length."

        data_flat = []
        for i, row in enumerate(data):
            row_vals = []
            for j, val in enumerate(row):
                validated = validate_float(val, f"data[{i}][{j}]")
                if isinstance(validated, str):
                    return validated
                row_vals.append(validated)
            data_flat.append(row_vals)

        if len(groups) != n_obs:
            return f"Error: Invalid input: groups must have {n_obs} rows to match data."

        for i, row in enumerate(groups):
            if not isinstance(row, list):
                return f"Error: Invalid input: groups row {i} must be a list."
            if len(row) != 1:
                return "Error: Invalid input: groups must be a column vector (each row has 1 element)."

        group_vals = []
        for i, row in enumerate(groups):
            validated = validate_float(row[0], f"groups[{i}][0]")
            if isinstance(validated, str):
                return validated
            group_vals.append(int(validated))

        unique_groups = list(set(group_vals))
        if len(unique_groups) < 2:
            return "Error: Invalid input: groups must contain at least 2 distinct values."

        if n_vars < 1:
            return "Error: Invalid input: data must have at least 1 dependent variable."

        df_data = {}
        for j in range(n_vars):
            df_data[f"DV{j+1}"] = [data_flat[i][j] for i in range(n_obs)]
        df_data["Group"] = group_vals
        df = pd.DataFrame(df_data)

        dv_names = [f"DV{j+1}" for j in range(n_vars)]
        formula = " + ".join(dv_names) + " ~ Group"

        try:
            manova = statsmodels_manova.from_formula(formula, data=df)
            results = manova.mv_test()
        except Exception as exc:
            return f"Error: {str(exc)}"

        try:
            test_results = results.results["Group"]["stat"]
        except Exception as exc:
            return f"Error: Unable to extract MANOVA results: {str(exc)}"

        output = []
        output.append(["test_statistic", "statistic_name", "statistic_value", "f_value", "df_num", "df_denom", "p_value"])

        test_stats = [
            ("Wilks' lambda", "Wilks"),
            ("Pillai's trace", "Pillai"),
            ("Hotelling-Lawley trace", "Hotelling-Lawley"),
            ("Roy's greatest root", "Roy")
        ]

        min_p_value = 1.0
        for stat_name, stat_key in test_stats:
            try:
                if stat_name in test_results.index:
                    row_data = test_results.loc[stat_name]
                    stat_value = float(row_data["Value"])
                    f_value = float(row_data["F Value"])
                    df_num = float(row_data["Num DF"])
                    df_denom = float(row_data["Den DF"])
                    p_value = float(row_data["Pr > F"])
                    if p_value < min_p_value:
                        min_p_value = p_value
                    output.append([stat_key, stat_name, stat_value, f_value, df_num, df_denom, p_value])
            except Exception as exc:
                return f"Error: Unable to extract MANOVA statistic {stat_name}: {str(exc)}"

        conclusion = "reject_null" if min_p_value < alpha_val else "fail_to_reject_null"
        output.append([conclusion, "", "", "", "", "", ""])

        return output
    except Exception as exc:
        return f"Error: {str(exc)}"

Online Calculator

A matrix of dependent variables where rows are observations and columns are dependent variables.
A column vector of group membership indicators (integer coded). Must have the same number of rows as data.
Significance level for hypothesis testing. Must be between 0 and 1 (exclusive).