FACTOR_ANALYSIS

Overview

The FACTOR_ANALYSIS function performs exploratory factor analysis (EFA) on multivariate data to identify latent (unobserved) variables that explain the correlations among observed variables. Factor analysis is a dimensionality reduction technique commonly used in psychometrics, marketing research, social sciences, and data exploration to uncover the underlying structure of complex datasets.

This implementation uses the statsmodels library’s Factor class. The function supports two extraction methods: Principal Axis Factoring (PA), which iteratively estimates communalities to find factors explaining common variance, and Maximum Likelihood (ML), which finds factors that maximize the likelihood of the observed correlation matrix under a multivariate normal assumption.

The underlying factor model assumes that each observed variable x_i is a linear combination of k latent factors plus an error term:

x_i = l_{i,1}F_1 + l_{i,2}F_2 + \cdots + l_{i,k}F_k + \varepsilon_i

where l_{i,j} are the factor loadings indicating the relationship between variable i and factor j, F_j are the latent factors, and \varepsilon_i represents the unique variance (error). The communality for each variable measures the proportion of variance explained by all factors, while uniqueness represents the unexplained portion.

After extraction, the function applies factor rotation to improve interpretability. Supported rotations include Varimax (orthogonal, maximizing variance of squared loadings), Promax (oblique), Quartimax, and Oblimin. Rotation does not change the overall fit but redistributes variance among factors to achieve a simpler structure where each variable loads strongly on fewer factors. For more on factor rotation methods, see the statsmodels multivariate documentation.

The output includes factor loadings for each variable, communalities, uniqueness values, and the variance explained by each factor. For foundational concepts and mathematical details, refer to the Factor Analysis Wikipedia article.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=FACTOR_ANALYSIS(data, n_factors, fa_rotation, fa_method)
  • data (list[list], required): Data matrix with observations as rows and variables as columns.
  • n_factors (int, required): Number of factors to extract.
  • fa_rotation (str, optional, default: “varimax”): Rotation method to apply.
  • fa_method (str, optional, default: “pa”): Factor extraction method.

Returns (list[list]): 2D list with factor loadings, or error message string.

Examples

Example 1: Demo case 1

Inputs:

data n_factors fa_rotation fa_method
2.5 2.4 3 2 varimax pa
0.5 0.7 0.8
2.2 2.9 2.1
1.9 2.2 1.8
3.1 3 3.2
2.3 2.7 2.4
2 1.6 1.8
1 1.1 1.5
1.5 1.6 1.4
1.1 0.9 1

Excel formula:

=FACTOR_ANALYSIS({2.5,2.4,3;0.5,0.7,0.8;2.2,2.9,2.1;1.9,2.2,1.8;3.1,3,3.2;2.3,2.7,2.4;2,1.6,1.8;1,1.1,1.5;1.5,1.6,1.4;1.1,0.9,1}, 2, "varimax", "pa")

Expected output:

variable factor_1 factor_2 communality uniqueness
var_1 0.7458 0.6603 0.9922 0.007787
var_2 0.6346 0.6855 0.8726 0.1274
var_3 0.7724 0.5597 0.9098 0.09018
variance_explained 1.555 1.219

Example 2: Demo case 2

Inputs:

data n_factors fa_rotation
2.5 2.4 3 1 none
0.5 0.7 0.8
2.2 2.9 2.1
1.9 2.2 1.8
3.1 3 3.2
2.3 2.7 2.4
2 1.6 1.8
1 1.1 1.5
1.5 1.6 1.4
1.1 0.9 1

Excel formula:

=FACTOR_ANALYSIS({2.5,2.4,3;0.5,0.7,0.8;2.2,2.9,2.1;1.9,2.2,1.8;3.1,3,3.2;2.3,2.7,2.4;2,1.6,1.8;1,1.1,1.5;1.5,1.6,1.4;1.1,0.9,1}, 1, "none")

Expected output:

variable factor_1 communality uniqueness
var_1 1.001 1.002 -0.001986
var_2 0.925 0.8556 0.1444
var_3 0.9447 0.8924 0.1076
variance_explained 2.75

Example 3: Demo case 3

Inputs:

data n_factors fa_rotation
2.5 2.4 3 2 promax
0.5 0.7 0.8
2.2 2.9 2.1
1.9 2.2 1.8
3.1 3 3.2
2.3 2.7 2.4
2 1.6 1.8
1 1.1 1.5
1.5 1.6 1.4
1.1 0.9 1

Excel formula:

=FACTOR_ANALYSIS({2.5,2.4,3;0.5,0.7,0.8;2.2,2.9,2.1;1.9,2.2,1.8;3.1,3,3.2;2.3,2.7,2.4;2,1.6,1.8;1,1.1,1.5;1.5,1.6,1.4;1.1,0.9,1}, 2, "promax")

Expected output:

variable factor_1 factor_2 communality uniqueness
var_1 0.7458 0.6603 0.9922 0.007787
var_2 0.6346 0.6855 0.8726 0.1274
var_3 0.7724 0.5597 0.9098 0.09018
variance_explained 1.555 1.219

Example 4: Demo case 4

Inputs:

data n_factors fa_rotation fa_method
2.5 2.4 3 2 quartimax pa
0.5 0.7 0.8
2.2 2.9 2.1
1.9 2.2 1.8
3.1 3 3.2
2.3 2.7 2.4
2 1.6 1.8
1 1.1 1.5
1.5 1.6 1.4
1.1 0.9 1

Excel formula:

=FACTOR_ANALYSIS({2.5,2.4,3;0.5,0.7,0.8;2.2,2.9,2.1;1.9,2.2,1.8;3.1,3,3.2;2.3,2.7,2.4;2,1.6,1.8;1,1.1,1.5;1.5,1.6,1.4;1.1,0.9,1}, 2, "quartimax", "pa")

Expected output:

variable factor_1 factor_2 communality uniqueness
var_1 0.9961 0.002086 0.9922 0.007787
var_2 0.9294 0.09453 0.8726 0.1274
var_3 0.9495 -0.09098 0.9098 0.09018
variance_explained 2.757 0.01722

Python Code

import math

import numpy as np
from statsmodels.multivariate.factor import Factor as sm_Factor

def factor_analysis(data, n_factors, fa_rotation='varimax', fa_method='pa'):
    """
    Performs exploratory factor analysis with rotation.

    See: https://www.statsmodels.org/stable/generated/statsmodels.multivariate.factor.Factor.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): Data matrix with observations as rows and variables as columns.
        n_factors (int): Number of factors to extract.
        fa_rotation (str, optional): Rotation method to apply. Valid options: Varimax, Promax, Quartimax, Oblimin, None. Default is 'varimax'.
        fa_method (str, optional): Factor extraction method. Valid options: Principal Axis, Maximum Likelihood. Default is 'pa'.

    Returns:
        list[list]: 2D list with factor loadings, or error message string.
    """
    def to2d(x):
        return [[x]] if not isinstance(x, list) else x

    def _validate_float(value, name):
        try:
            converted = float(value)
        except Exception:
            return f"Invalid input: {name} must be a number."
        if math.isnan(converted) or math.isinf(converted):
            return f"Invalid input: {name} must be finite."
        return converted

    # Normalize data input to 2D list
    data_2d = to2d(data)

    # Validate data is 2D list
    if not isinstance(data_2d, list):
        return "Invalid input: data must be a 2D list."
    if len(data_2d) == 0:
        return "Invalid input: data cannot be empty."
    if not all(isinstance(row, list) for row in data_2d):
        return "Invalid input: data must be a 2D list."
    if len(data_2d[0]) == 0:
        return "Invalid input: data rows cannot be empty."

    # Check all rows have same length
    num_cols = len(data_2d[0])
    if not all(len(row) == num_cols for row in data_2d):
        return "Invalid input: all data rows must have the same length."

    # Validate all elements are numeric
    validated_data = []
    for i, row in enumerate(data_2d):
        validated_row = []
        for j, val in enumerate(row):
            converted = _validate_float(val, f"data[{i}][{j}]")
            if isinstance(converted, str):
                return converted
            validated_row.append(converted)
        validated_data.append(validated_row)

    # Validate n_factors
    try:
        n_factors_int = int(n_factors)
    except Exception:
        return "Invalid input: n_factors must be an integer."

    if n_factors_int < 1:
        return "Invalid input: n_factors must be at least 1."

    num_vars = len(validated_data[0])
    num_obs = len(validated_data)

    if n_factors_int > num_vars:
        return f"Invalid input: n_factors ({n_factors_int}) cannot exceed number of variables ({num_vars})."

    if num_obs < num_vars:
        return f"Invalid input: number of observations ({num_obs}) must be at least as many as variables ({num_vars})."

    # Validate rotation method
    valid_rotations = ['varimax', 'promax', 'quartimax', 'oblimin', 'none']
    if not isinstance(fa_rotation, str):
        return "Invalid input: fa_rotation must be a string."

    rotation_lower = fa_rotation.lower()
    if rotation_lower not in valid_rotations:
        return f"Invalid input: fa_rotation must be one of {valid_rotations}."

    # Validate method
    valid_methods = ['pa', 'ml']
    if not isinstance(fa_method, str):
        return "Invalid input: fa_method must be a string."

    method_lower = fa_method.lower()
    if method_lower not in valid_methods:
        return f"Invalid input: fa_method must be one of {valid_methods}."

    # Convert data to format statsmodels expects
    data_array = np.array(validated_data)

    # Perform factor analysis
    try:
        # Create Factor object with specified method
        fa = sm_Factor(data_array, n_factor=n_factors_int, method=method_lower)

        # Fit the model
        fa_result = fa.fit()

        # Apply rotation if not 'none' (modifies in-place)
        if rotation_lower != 'none':
            fa_result.rotate(rotation_lower)

        # Extract loadings
        loadings = fa_result.loadings

        # Get communality and uniqueness
        communality = fa_result.communality
        uniqueness = fa_result.uniqueness

        # Build output as 2D list
        output = []

        # Header row
        header = ['variable']
        for i in range(n_factors_int):
            header.append(f'factor_{i+1}')
        header.extend(['communality', 'uniqueness'])
        output.append(header)

        # Factor loadings for each variable
        for i in range(num_vars):
            row = [f'var_{i+1}']
            for j in range(n_factors_int):
                row.append(float(loadings[i, j]))
            row.append(float(communality[i]))
            row.append(float(uniqueness[i]))
            output.append(row)

        # Variance explained by each factor
        variance_row = ['variance_explained']
        for i in range(n_factors_int):
            # Sum of squared loadings for each factor
            variance = float(np.sum(loadings[:, i] ** 2))
            variance_row.append(variance)
        # Fill remaining columns with empty strings
        variance_row.extend(['', ''])
        output.append(variance_row)

        return output

    except Exception as exc:
        return f"statsmodels.multivariate.factor.Factor error: {exc}"

Online Calculator