FACTOR_ANALYSIS
Overview
The FACTOR_ANALYSIS function performs exploratory factor analysis (EFA) on multivariate data to identify latent (unobserved) variables that explain the correlations among observed variables. Factor analysis is a dimensionality reduction technique commonly used in psychometrics, marketing research, social sciences, and data exploration to uncover the underlying structure of complex datasets.
This implementation uses the statsmodels library’s Factor class. The function supports two extraction methods: Principal Axis Factoring (PA), which iteratively estimates communalities to find factors explaining common variance, and Maximum Likelihood (ML), which finds factors that maximize the likelihood of the observed correlation matrix under a multivariate normal assumption.
The underlying factor model assumes that each observed variable x_i is a linear combination of k latent factors plus an error term:
x_i = l_{i,1}F_1 + l_{i,2}F_2 + \cdots + l_{i,k}F_k + \varepsilon_i
where l_{i,j} are the factor loadings indicating the relationship between variable i and factor j, F_j are the latent factors, and \varepsilon_i represents the unique variance (error). The communality for each variable measures the proportion of variance explained by all factors, while uniqueness represents the unexplained portion.
After extraction, the function applies factor rotation to improve interpretability. Supported rotations include Varimax (orthogonal, maximizing variance of squared loadings), Promax (oblique), Quartimax, and Oblimin. Rotation does not change the overall fit but redistributes variance among factors to achieve a simpler structure where each variable loads strongly on fewer factors. For more on factor rotation methods, see the statsmodels multivariate documentation.
The output includes factor loadings for each variable, communalities, uniqueness values, and the variance explained by each factor. For foundational concepts and mathematical details, refer to the Factor Analysis Wikipedia article.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=FACTOR_ANALYSIS(data, n_factors, fa_rotation, fa_method)
data(list[list], required): Data matrix with observations as rows and variables as columns.n_factors(int, required): Number of factors to extract.fa_rotation(str, optional, default: “varimax”): Rotation method to apply.fa_method(str, optional, default: “pa”): Factor extraction method.
Returns (list[list]): 2D list with factor loadings, or error message string.
Examples
Example 1: Demo case 1
Inputs:
| data | n_factors | fa_rotation | fa_method | ||
|---|---|---|---|---|---|
| 2.5 | 2.4 | 3 | 2 | varimax | pa |
| 0.5 | 0.7 | 0.8 | |||
| 2.2 | 2.9 | 2.1 | |||
| 1.9 | 2.2 | 1.8 | |||
| 3.1 | 3 | 3.2 | |||
| 2.3 | 2.7 | 2.4 | |||
| 2 | 1.6 | 1.8 | |||
| 1 | 1.1 | 1.5 | |||
| 1.5 | 1.6 | 1.4 | |||
| 1.1 | 0.9 | 1 |
Excel formula:
=FACTOR_ANALYSIS({2.5,2.4,3;0.5,0.7,0.8;2.2,2.9,2.1;1.9,2.2,1.8;3.1,3,3.2;2.3,2.7,2.4;2,1.6,1.8;1,1.1,1.5;1.5,1.6,1.4;1.1,0.9,1}, 2, "varimax", "pa")
Expected output:
| variable | factor_1 | factor_2 | communality | uniqueness |
|---|---|---|---|---|
| var_1 | 0.7458 | 0.6603 | 0.9922 | 0.007787 |
| var_2 | 0.6346 | 0.6855 | 0.8726 | 0.1274 |
| var_3 | 0.7724 | 0.5597 | 0.9098 | 0.09018 |
| variance_explained | 1.555 | 1.219 |
Example 2: Demo case 2
Inputs:
| data | n_factors | fa_rotation | ||
|---|---|---|---|---|
| 2.5 | 2.4 | 3 | 1 | none |
| 0.5 | 0.7 | 0.8 | ||
| 2.2 | 2.9 | 2.1 | ||
| 1.9 | 2.2 | 1.8 | ||
| 3.1 | 3 | 3.2 | ||
| 2.3 | 2.7 | 2.4 | ||
| 2 | 1.6 | 1.8 | ||
| 1 | 1.1 | 1.5 | ||
| 1.5 | 1.6 | 1.4 | ||
| 1.1 | 0.9 | 1 |
Excel formula:
=FACTOR_ANALYSIS({2.5,2.4,3;0.5,0.7,0.8;2.2,2.9,2.1;1.9,2.2,1.8;3.1,3,3.2;2.3,2.7,2.4;2,1.6,1.8;1,1.1,1.5;1.5,1.6,1.4;1.1,0.9,1}, 1, "none")
Expected output:
| variable | factor_1 | communality | uniqueness |
|---|---|---|---|
| var_1 | 1.001 | 1.002 | -0.001986 |
| var_2 | 0.925 | 0.8556 | 0.1444 |
| var_3 | 0.9447 | 0.8924 | 0.1076 |
| variance_explained | 2.75 |
Example 3: Demo case 3
Inputs:
| data | n_factors | fa_rotation | ||
|---|---|---|---|---|
| 2.5 | 2.4 | 3 | 2 | promax |
| 0.5 | 0.7 | 0.8 | ||
| 2.2 | 2.9 | 2.1 | ||
| 1.9 | 2.2 | 1.8 | ||
| 3.1 | 3 | 3.2 | ||
| 2.3 | 2.7 | 2.4 | ||
| 2 | 1.6 | 1.8 | ||
| 1 | 1.1 | 1.5 | ||
| 1.5 | 1.6 | 1.4 | ||
| 1.1 | 0.9 | 1 |
Excel formula:
=FACTOR_ANALYSIS({2.5,2.4,3;0.5,0.7,0.8;2.2,2.9,2.1;1.9,2.2,1.8;3.1,3,3.2;2.3,2.7,2.4;2,1.6,1.8;1,1.1,1.5;1.5,1.6,1.4;1.1,0.9,1}, 2, "promax")
Expected output:
| variable | factor_1 | factor_2 | communality | uniqueness |
|---|---|---|---|---|
| var_1 | 0.7458 | 0.6603 | 0.9922 | 0.007787 |
| var_2 | 0.6346 | 0.6855 | 0.8726 | 0.1274 |
| var_3 | 0.7724 | 0.5597 | 0.9098 | 0.09018 |
| variance_explained | 1.555 | 1.219 |
Example 4: Demo case 4
Inputs:
| data | n_factors | fa_rotation | fa_method | ||
|---|---|---|---|---|---|
| 2.5 | 2.4 | 3 | 2 | quartimax | pa |
| 0.5 | 0.7 | 0.8 | |||
| 2.2 | 2.9 | 2.1 | |||
| 1.9 | 2.2 | 1.8 | |||
| 3.1 | 3 | 3.2 | |||
| 2.3 | 2.7 | 2.4 | |||
| 2 | 1.6 | 1.8 | |||
| 1 | 1.1 | 1.5 | |||
| 1.5 | 1.6 | 1.4 | |||
| 1.1 | 0.9 | 1 |
Excel formula:
=FACTOR_ANALYSIS({2.5,2.4,3;0.5,0.7,0.8;2.2,2.9,2.1;1.9,2.2,1.8;3.1,3,3.2;2.3,2.7,2.4;2,1.6,1.8;1,1.1,1.5;1.5,1.6,1.4;1.1,0.9,1}, 2, "quartimax", "pa")
Expected output:
| variable | factor_1 | factor_2 | communality | uniqueness |
|---|---|---|---|---|
| var_1 | 0.9961 | 0.002086 | 0.9922 | 0.007787 |
| var_2 | 0.9294 | 0.09453 | 0.8726 | 0.1274 |
| var_3 | 0.9495 | -0.09098 | 0.9098 | 0.09018 |
| variance_explained | 2.757 | 0.01722 |
Python Code
import math
import numpy as np
from statsmodels.multivariate.factor import Factor as sm_Factor
def factor_analysis(data, n_factors, fa_rotation='varimax', fa_method='pa'):
"""
Performs exploratory factor analysis with rotation.
See: https://www.statsmodels.org/stable/generated/statsmodels.multivariate.factor.Factor.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): Data matrix with observations as rows and variables as columns.
n_factors (int): Number of factors to extract.
fa_rotation (str, optional): Rotation method to apply. Valid options: Varimax, Promax, Quartimax, Oblimin, None. Default is 'varimax'.
fa_method (str, optional): Factor extraction method. Valid options: Principal Axis, Maximum Likelihood. Default is 'pa'.
Returns:
list[list]: 2D list with factor loadings, or error message string.
"""
def to2d(x):
return [[x]] if not isinstance(x, list) else x
def _validate_float(value, name):
try:
converted = float(value)
except Exception:
return f"Invalid input: {name} must be a number."
if math.isnan(converted) or math.isinf(converted):
return f"Invalid input: {name} must be finite."
return converted
# Normalize data input to 2D list
data_2d = to2d(data)
# Validate data is 2D list
if not isinstance(data_2d, list):
return "Invalid input: data must be a 2D list."
if len(data_2d) == 0:
return "Invalid input: data cannot be empty."
if not all(isinstance(row, list) for row in data_2d):
return "Invalid input: data must be a 2D list."
if len(data_2d[0]) == 0:
return "Invalid input: data rows cannot be empty."
# Check all rows have same length
num_cols = len(data_2d[0])
if not all(len(row) == num_cols for row in data_2d):
return "Invalid input: all data rows must have the same length."
# Validate all elements are numeric
validated_data = []
for i, row in enumerate(data_2d):
validated_row = []
for j, val in enumerate(row):
converted = _validate_float(val, f"data[{i}][{j}]")
if isinstance(converted, str):
return converted
validated_row.append(converted)
validated_data.append(validated_row)
# Validate n_factors
try:
n_factors_int = int(n_factors)
except Exception:
return "Invalid input: n_factors must be an integer."
if n_factors_int < 1:
return "Invalid input: n_factors must be at least 1."
num_vars = len(validated_data[0])
num_obs = len(validated_data)
if n_factors_int > num_vars:
return f"Invalid input: n_factors ({n_factors_int}) cannot exceed number of variables ({num_vars})."
if num_obs < num_vars:
return f"Invalid input: number of observations ({num_obs}) must be at least as many as variables ({num_vars})."
# Validate rotation method
valid_rotations = ['varimax', 'promax', 'quartimax', 'oblimin', 'none']
if not isinstance(fa_rotation, str):
return "Invalid input: fa_rotation must be a string."
rotation_lower = fa_rotation.lower()
if rotation_lower not in valid_rotations:
return f"Invalid input: fa_rotation must be one of {valid_rotations}."
# Validate method
valid_methods = ['pa', 'ml']
if not isinstance(fa_method, str):
return "Invalid input: fa_method must be a string."
method_lower = fa_method.lower()
if method_lower not in valid_methods:
return f"Invalid input: fa_method must be one of {valid_methods}."
# Convert data to format statsmodels expects
data_array = np.array(validated_data)
# Perform factor analysis
try:
# Create Factor object with specified method
fa = sm_Factor(data_array, n_factor=n_factors_int, method=method_lower)
# Fit the model
fa_result = fa.fit()
# Apply rotation if not 'none' (modifies in-place)
if rotation_lower != 'none':
fa_result.rotate(rotation_lower)
# Extract loadings
loadings = fa_result.loadings
# Get communality and uniqueness
communality = fa_result.communality
uniqueness = fa_result.uniqueness
# Build output as 2D list
output = []
# Header row
header = ['variable']
for i in range(n_factors_int):
header.append(f'factor_{i+1}')
header.extend(['communality', 'uniqueness'])
output.append(header)
# Factor loadings for each variable
for i in range(num_vars):
row = [f'var_{i+1}']
for j in range(n_factors_int):
row.append(float(loadings[i, j]))
row.append(float(communality[i]))
row.append(float(uniqueness[i]))
output.append(row)
# Variance explained by each factor
variance_row = ['variance_explained']
for i in range(n_factors_int):
# Sum of squared loadings for each factor
variance = float(np.sum(loadings[:, i] ** 2))
variance_row.append(variance)
# Fill remaining columns with empty strings
variance_row.extend(['', ''])
output.append(variance_row)
return output
except Exception as exc:
return f"statsmodels.multivariate.factor.Factor error: {exc}"