FACTOR_ANALYSIS

Overview

The FACTOR_ANALYSIS function performs exploratory factor analysis (EFA) on multivariate data to identify latent (unobserved) variables that explain the correlations among observed variables. Factor analysis is a dimensionality reduction technique commonly used in psychometrics, marketing research, social sciences, and data exploration to uncover the underlying structure of complex datasets.

This implementation uses the statsmodels library’s Factor class. The function supports two extraction methods: Principal Axis Factoring (PA), which iteratively estimates communalities to find factors explaining common variance, and Maximum Likelihood (ML), which finds factors that maximize the likelihood of the observed correlation matrix under a multivariate normal assumption.

The underlying factor model assumes that each observed variable x_i is a linear combination of k latent factors plus an error term:

x_i = l_{i,1}F_1 + l_{i,2}F_2 + \cdots + l_{i,k}F_k + \varepsilon_i

where l_{i,j} are the factor loadings indicating the relationship between variable i and factor j, F_j are the latent factors, and \varepsilon_i represents the unique variance (error). The communality for each variable measures the proportion of variance explained by all factors, while uniqueness represents the unexplained portion.

After extraction, the function applies factor rotation to improve interpretability. Supported rotations include Varimax (orthogonal, maximizing variance of squared loadings), Promax (oblique), Quartimax, and Oblimin. Rotation does not change the overall fit but redistributes variance among factors to achieve a simpler structure where each variable loads strongly on fewer factors. For more on factor rotation methods, see the statsmodels multivariate documentation.

The output includes factor loadings for each variable, communalities, uniqueness values, and the variance explained by each factor. For foundational concepts and mathematical details, refer to the Factor Analysis Wikipedia article.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=FACTOR_ANALYSIS(data, n_factors, fa_rotation, fa_method)
  • data (list[list], required): Data matrix with observations as rows and variables as columns.
  • n_factors (int, required): Number of factors to extract.
  • fa_rotation (str, optional, default: “varimax”): Rotation method to apply.
  • fa_method (str, optional, default: “pa”): Factor extraction method.

Returns (list[list]): 2D list with factor loadings, or error message string.

Example 1: Two-factor extraction with varimax rotation

Inputs:

data n_factors fa_rotation fa_method
2.5 2.4 3 2 varimax pa
0.5 0.7 0.8
2.2 2.9 2.1
1.9 2.2 1.8
3.1 3 3.2
2.3 2.7 2.4
2 1.6 1.8
1 1.1 1.5
1.5 1.6 1.4
1.1 0.9 1

Excel formula:

=FACTOR_ANALYSIS({2.5,2.4,3;0.5,0.7,0.8;2.2,2.9,2.1;1.9,2.2,1.8;3.1,3,3.2;2.3,2.7,2.4;2,1.6,1.8;1,1.1,1.5;1.5,1.6,1.4;1.1,0.9,1}, 2, "varimax", "pa")

Expected output:

variable factor_1 factor_2 communality uniqueness
var_1 0.74579 0.660311 0.992213 0.00778677
var_2 0.634597 0.685515 0.872644 0.127356
var_3 0.772382 0.559688 0.909825 0.0901754
variance_explained 1.55549 1.21919
Example 2: Single-factor extraction without rotation

Inputs:

data n_factors fa_rotation
2.5 2.4 3 1 none
0.5 0.7 0.8
2.2 2.9 2.1
1.9 2.2 1.8
3.1 3 3.2
2.3 2.7 2.4
2 1.6 1.8
1 1.1 1.5
1.5 1.6 1.4
1.1 0.9 1

Excel formula:

=FACTOR_ANALYSIS({2.5,2.4,3;0.5,0.7,0.8;2.2,2.9,2.1;1.9,2.2,1.8;3.1,3,3.2;2.3,2.7,2.4;2,1.6,1.8;1,1.1,1.5;1.5,1.6,1.4;1.1,0.9,1}, 1, "none")

Expected output:

variable factor_1 communality uniqueness
var_1 1.00099 1.00199 -0.00198608
var_2 0.925011 0.855646 0.144354
var_3 0.944665 0.892392 0.107608
variance_explained 2.75002
Example 3: Two-factor extraction with promax rotation

Inputs:

data n_factors fa_rotation
2.5 2.4 3 2 promax
0.5 0.7 0.8
2.2 2.9 2.1
1.9 2.2 1.8
3.1 3 3.2
2.3 2.7 2.4
2 1.6 1.8
1 1.1 1.5
1.5 1.6 1.4
1.1 0.9 1

Excel formula:

=FACTOR_ANALYSIS({2.5,2.4,3;0.5,0.7,0.8;2.2,2.9,2.1;1.9,2.2,1.8;3.1,3,3.2;2.3,2.7,2.4;2,1.6,1.8;1,1.1,1.5;1.5,1.6,1.4;1.1,0.9,1}, 2, "promax")

Expected output:

variable factor_1 factor_2 communality uniqueness
var_1 0.74579 0.660311 0.992213 0.00778677
var_2 0.634597 0.685515 0.872644 0.127356
var_3 0.772382 0.559688 0.909825 0.0901754
variance_explained 1.55549 1.21919
Example 4: Two-factor extraction with quartimax and explicit method

Inputs:

data n_factors fa_rotation fa_method
2.5 2.4 3 2 quartimax pa
0.5 0.7 0.8
2.2 2.9 2.1
1.9 2.2 1.8
3.1 3 3.2
2.3 2.7 2.4
2 1.6 1.8
1 1.1 1.5
1.5 1.6 1.4
1.1 0.9 1

Excel formula:

=FACTOR_ANALYSIS({2.5,2.4,3;0.5,0.7,0.8;2.2,2.9,2.1;1.9,2.2,1.8;3.1,3,3.2;2.3,2.7,2.4;2,1.6,1.8;1,1.1,1.5;1.5,1.6,1.4;1.1,0.9,1}, 2, "quartimax", "pa")

Expected output:

variable factor_1 factor_2 communality uniqueness
var_1 0.996097 0.00208583 0.992213 0.00778677
var_2 0.92936 0.0945268 0.872644 0.127356
var_3 0.949499 -0.0909776 0.909825 0.0901754
variance_explained 2.75747 0.0172166

Python Code

Show Code
import math

import numpy as np
from statsmodels.multivariate.factor import Factor as sm_Factor

def factor_analysis(data, n_factors, fa_rotation='varimax', fa_method='pa'):
    """
    Performs exploratory factor analysis with rotation.

    See: https://www.statsmodels.org/stable/generated/statsmodels.multivariate.factor.Factor.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): Data matrix with observations as rows and variables as columns.
        n_factors (int): Number of factors to extract.
        fa_rotation (str, optional): Rotation method to apply. Valid options: Varimax, Promax, Quartimax, Oblimin, None. Default is 'varimax'.
        fa_method (str, optional): Factor extraction method. Valid options: Principal Axis, Maximum Likelihood. Default is 'pa'.

    Returns:
        list[list]: 2D list with factor loadings, or error message string.
    """
    def to2d(x):
        return [[x]] if not isinstance(x, list) else x

    def _validate_float(value, name):
        try:
            converted = float(value)
        except Exception:
            return f"Error: Invalid input: {name} must be a number."
        if math.isnan(converted) or math.isinf(converted):
            return f"Error: Invalid input: {name} must be finite."
        return converted

    try:
        data_2d = to2d(data)

        if not isinstance(data_2d, list):
            return "Error: Invalid input: data must be a 2D list."
        if len(data_2d) == 0:
            return "Error: Invalid input: data cannot be empty."
        if not all(isinstance(row, list) for row in data_2d):
            return "Error: Invalid input: data must be a 2D list."
        if len(data_2d[0]) == 0:
            return "Error: Invalid input: data rows cannot be empty."

        num_cols = len(data_2d[0])
        if not all(len(row) == num_cols for row in data_2d):
            return "Error: Invalid input: all data rows must have the same length."

        validated_data = []
        for i, row in enumerate(data_2d):
            validated_row = []
            for j, val in enumerate(row):
                converted = _validate_float(val, f"data[{i}][{j}]")
                if isinstance(converted, str):
                    return converted
                validated_row.append(converted)
            validated_data.append(validated_row)

        try:
            n_factors_int = int(n_factors)
        except Exception:
            return "Error: Invalid input: n_factors must be an integer."

        if n_factors_int < 1:
            return "Error: Invalid input: n_factors must be at least 1."

        num_vars = len(validated_data[0])
        num_obs = len(validated_data)

        if n_factors_int > num_vars:
            return f"Error: Invalid input: n_factors ({n_factors_int}) cannot exceed number of variables ({num_vars})."

        if num_obs < num_vars:
            return f"Error: Invalid input: number of observations ({num_obs}) must be at least as many as variables ({num_vars})."

        valid_rotations = ['varimax', 'promax', 'quartimax', 'oblimin', 'none']
        if not isinstance(fa_rotation, str):
            return "Error: Invalid input: fa_rotation must be a string."
        rotation_lower = fa_rotation.lower()
        if rotation_lower not in valid_rotations:
            return f"Error: Invalid input: fa_rotation must be one of {valid_rotations}."

        valid_methods = ['pa', 'ml']
        if not isinstance(fa_method, str):
            return "Error: Invalid input: fa_method must be a string."
        method_lower = fa_method.lower()
        if method_lower not in valid_methods:
            return f"Error: Invalid input: fa_method must be one of {valid_methods}."

        data_array = np.array(validated_data)
        fa = sm_Factor(data_array, n_factor=n_factors_int, method=method_lower)
        fa_result = fa.fit()

        if rotation_lower != 'none':
            fa_result.rotate(rotation_lower)

        loadings = fa_result.loadings
        communality = fa_result.communality
        uniqueness = fa_result.uniqueness

        output = []
        header = ['variable']
        for i in range(n_factors_int):
            header.append(f'factor_{i+1}')
        header.extend(['communality', 'uniqueness'])
        output.append(header)

        for i in range(num_vars):
            row = [f'var_{i+1}']
            for j in range(n_factors_int):
                row.append(float(loadings[i, j]))
            row.append(float(communality[i]))
            row.append(float(uniqueness[i]))
            output.append(row)

        variance_row = ['variance_explained']
        for i in range(n_factors_int):
            variance = float(np.sum(loadings[:, i] ** 2))
            variance_row.append(variance)
        variance_row.extend(['', ''])
        output.append(variance_row)

        return output
    except Exception as exc:
        return f"Error: {str(exc)}"

Online Calculator

Data matrix with observations as rows and variables as columns.
Number of factors to extract.
Rotation method to apply.
Factor extraction method.