MANOVA_TEST

Overview

The MANOVA_TEST function performs Multivariate Analysis of Variance (MANOVA), a statistical procedure for comparing multivariate sample means across two or more groups. MANOVA extends univariate analysis of variance (ANOVA) to situations where there are multiple dependent variables, using the covariance between outcome variables when testing the statistical significance of mean differences.

This implementation uses the statsmodels library’s MANOVA class, which is based on multivariate regression. For more details, see the statsmodels MANOVA documentation. The function tests the null hypothesis that all group mean vectors are equal across the specified dependent variables.

The function returns four commonly used test statistics, each derived from the eigenvalues \lambda_p of the matrix A = S_{\text{model}} S_{\text{res}}^{-1}:

Wilks’ lambda: \Lambda_{\text{Wilks}} = \prod (1 + \lambda_p)^{-1} — measures the proportion of variance not explained by group differences
Pillai’s trace: \Lambda_{\text{Pillai}} = \sum \frac{\lambda_p}{1 + \lambda_p} — considered the most robust to violations of assumptions
Hotelling-Lawley trace: \Lambda_{\text{LH}} = \sum \lambda_p — powerful when group differences are concentrated in one dimension
Roy’s greatest root: \Lambda_{\text{Roy}} = \max(\lambda_p) — most powerful when the alternative hypothesis is true for a single linear combination

Each test statistic is converted to an approximate F-statistic with associated degrees of freedom and p-value. The function compares the minimum p-value across all test statistics against the specified significance level (alpha) to determine whether to reject the null hypothesis.

MANOVA is particularly useful in experimental designs where multiple related outcomes are measured simultaneously, as it controls the family-wise error rate better than running separate ANOVAs. For background on multivariate analysis of variance, see the Wikipedia article on MANOVA.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=MANOVA_TEST(data, groups, alpha)

data (list[list], required): A matrix of dependent variables where rows are observations and columns are dependent variables.
groups (list[list], required): A column vector of group membership indicators (integer coded). Must have the same number of rows as data.
alpha (float, optional, default: 0.05): Significance level for hypothesis testing. Must be between 0 and 1 (exclusive).

Returns (list[list]): 2D list with MANOVA results, or error message string.

Example 1: Two groups with two dependent variables

Inputs:

data		groups
1	2	1
2	3	1
3	4	1
4	5	2
5	6	2
6	7	2

Excel formula:

=MANOVA_TEST({1,2;2,3;3,4;4,5;5,6;6,7}, {1;1;1;2;2;2})

Expected output:

test_statistic	statistic_name	statistic_value	f_value	df_num	df_denom	p_value
Wilks	Wilks’ lambda	0.228571	13.5	1	4	0.0213116
Pillai	Pillai’s trace	0.771429	13.5	1	4	0.0213116
Hotelling-Lawley	Hotelling-Lawley trace	3.375	13.5	1	4	0.0213116
Roy	Roy’s greatest root	3.375	13.5	1	4	0.0213116
reject_null

Example 2: Three groups with two dependent variables

Inputs:

data		groups
1	2	1
2	3	1
3	4	1
5	6	2
6	7	2
7	8	2
9	10	3
10	11	3
11	12	3

Excel formula:

=MANOVA_TEST({1,2;2,3;3,4;5,6;6,7;7,8;9,10;10,11;11,12}, {1;1;1;2;2;2;3;3;3})

Expected output:

test_statistic	statistic_name	statistic_value	f_value	df_num	df_denom	p_value
Wilks	Wilks’ lambda	0.0588235	112	1	7	0.0000147079
Pillai	Pillai’s trace	0.941176	112	1	7	0.0000147079
Hotelling-Lawley	Hotelling-Lawley trace	16	112	1	7	0.0000147079
Roy	Roy’s greatest root	16	112	1	7	0.0000147079
reject_null

Example 3: Custom alpha value with stricter significance level

Inputs:

data		groups	alpha
1	2	1	0.01
2	3	1
3	4	1
4	5	2
5	6	2
6	7	2

Excel formula:

=MANOVA_TEST({1,2;2,3;3,4;4,5;5,6;6,7}, {1;1;1;2;2;2}, 0.01)

Expected output:

test_statistic	statistic_name	statistic_value	f_value	df_num	df_denom	p_value
Wilks	Wilks’ lambda	0.228571	13.5	1	4	0.0213116
Pillai	Pillai’s trace	0.771429	13.5	1	4	0.0213116
Hotelling-Lawley	Hotelling-Lawley trace	3.375	13.5	1	4	0.0213116
Roy	Roy’s greatest root	3.375	13.5	1	4	0.0213116
fail_to_reject_null

Example 4: Two groups with three dependent variables

Inputs:

data			groups
1	2	3	1
2	3	4	1
3	4	5	1
5	6	7	2
6	7	8	2
7	8	9	2

Excel formula:

=MANOVA_TEST({1,2,3;2,3,4;3,4,5;5,6,7;6,7,8;7,8,9}, {1;1;1;2;2;2})

Expected output:

test_statistic	statistic_name	statistic_value	f_value	df_num	df_denom	p_value
Wilks	Wilks’ lambda	0.142857	24	1	4	0.00804989
Pillai	Pillai’s trace	0.857143	24	1	4	0.00804989
Hotelling-Lawley	Hotelling-Lawley trace	6	24	1	4	0.00804989
Roy	Roy’s greatest root	6	24	1	4	0.00804989
reject_null

Python Code

Show Code

import pandas as pd
from statsmodels.multivariate.manova import MANOVA as statsmodels_manova

def manova_test(data, groups, alpha=0.05):
    """
    Performs Multivariate Analysis of Variance (MANOVA) for multiple dependent variables.

    See: https://www.statsmodels.org/stable/generated/statsmodels.multivariate.manova.MANOVA.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): A matrix of dependent variables where rows are observations and columns are dependent variables.
        groups (list[list]): A column vector of group membership indicators (integer coded). Must have the same number of rows as data.
        alpha (float, optional): Significance level for hypothesis testing. Must be between 0 and 1 (exclusive). Default is 0.05.

    Returns:
        list[list]: 2D list with MANOVA results, or error message string.
    """
    def to2d(x):
        return [[x]] if not isinstance(x, list) else x

    def validate_float(val, name):
        if not isinstance(val, (int, float)):
            return f"Error: Invalid input: {name} must be a number."
        val = float(val)
        if val != val or val == float('inf') or val == float('-inf'):
            return f"Error: Invalid input: {name} must be finite."
        return val

    try:
        data = to2d(data)
        groups = to2d(groups)

        alpha_val = validate_float(alpha, "alpha")
        if isinstance(alpha_val, str):
            return alpha_val
        if alpha_val <= 0 or alpha_val >= 1:
            return "Error: Invalid input: alpha must be between 0 and 1."

        if not isinstance(data, list) or len(data) == 0:
            return "Error: Invalid input: data must be a non-empty 2D list."

        for i, row in enumerate(data):
            if not isinstance(row, list):
                return f"Error: Invalid input: data row {i} must be a list."
            if len(row) == 0:
                return f"Error: Invalid input: data row {i} must be non-empty."

        n_obs = len(data)
        n_vars = len(data[0])

        for row in data:
            if len(row) != n_vars:
                return "Error: Invalid input: all rows in data must have the same length."

        data_flat = []
        for i, row in enumerate(data):
            row_vals = []
            for j, val in enumerate(row):
                validated = validate_float(val, f"data[{i}][{j}]")
                if isinstance(validated, str):
                    return validated
                row_vals.append(validated)
            data_flat.append(row_vals)

        if len(groups) != n_obs:
            return f"Error: Invalid input: groups must have {n_obs} rows to match data."

        for i, row in enumerate(groups):
            if not isinstance(row, list):
                return f"Error: Invalid input: groups row {i} must be a list."
            if len(row) != 1:
                return "Error: Invalid input: groups must be a column vector (each row has 1 element)."

        group_vals = []
        for i, row in enumerate(groups):
            validated = validate_float(row[0], f"groups[{i}][0]")
            if isinstance(validated, str):
                return validated
            group_vals.append(int(validated))

        unique_groups = list(set(group_vals))
        if len(unique_groups) < 2:
            return "Error: Invalid input: groups must contain at least 2 distinct values."

        if n_vars < 1:
            return "Error: Invalid input: data must have at least 1 dependent variable."

        df_data = {}
        for j in range(n_vars):
            df_data[f"DV{j+1}"] = [data_flat[i][j] for i in range(n_obs)]
        df_data["Group"] = group_vals
        df = pd.DataFrame(df_data)

        dv_names = [f"DV{j+1}" for j in range(n_vars)]
        formula = " + ".join(dv_names) + " ~ Group"

        try:
            manova = statsmodels_manova.from_formula(formula, data=df)
            results = manova.mv_test()
        except Exception as exc:
            return f"Error: {str(exc)}"

        try:
            test_results = results.results["Group"]["stat"]
        except Exception as exc:
            return f"Error: Unable to extract MANOVA results: {str(exc)}"

        output = []
        output.append(["test_statistic", "statistic_name", "statistic_value", "f_value", "df_num", "df_denom", "p_value"])

        test_stats = [
            ("Wilks' lambda", "Wilks"),
            ("Pillai's trace", "Pillai"),
            ("Hotelling-Lawley trace", "Hotelling-Lawley"),
            ("Roy's greatest root", "Roy")
        ]

        min_p_value = 1.0
        for stat_name, stat_key in test_stats:
            try:
                if stat_name in test_results.index:
                    row_data = test_results.loc[stat_name]
                    stat_value = float(row_data["Value"])
                    f_value = float(row_data["F Value"])
                    df_num = float(row_data["Num DF"])
                    df_denom = float(row_data["Den DF"])
                    p_value = float(row_data["Pr > F"])
                    if p_value < min_p_value:
                        min_p_value = p_value
                    output.append([stat_key, stat_name, stat_value, f_value, df_num, df_denom, p_value])
            except Exception as exc:
                return f"Error: Unable to extract MANOVA statistic {stat_name}: {str(exc)}"

        conclusion = "reject_null" if min_p_value < alpha_val else "fail_to_reject_null"
        output.append([conclusion, "", "", "", "", "", ""])

        return output
    except Exception as exc:
        return f"Error: {str(exc)}"

Online Calculator

data *

A matrix of dependent variables where rows are observations and columns are dependent variables.

groups *

A column vector of group membership indicators (integer coded). Must have the same number of rows as data.

alpha

Significance level for hypothesis testing. Must be between 0 and 1 (exclusive).