MULTINOMIAL_LOGIT

Overview

The MULTINOMIAL_LOGIT function fits a multinomial logistic regression model, which extends binary logistic regression to handle categorical dependent variables with more than two possible outcomes. This type of model is widely used in discrete choice analysis, where the goal is to predict which category an observation will belong to based on a set of predictor variables.

Multinomial logistic regression models the probability of each outcome category relative to a reference category. For a dependent variable with K possible categories and a set of predictor variables X, the model estimates the probability of observing category k using the softmax function:

P(Y = k \mid X) = \frac{e^{\beta_k \cdot X}}{\sum_{j=1}^{K} e^{\beta_j \cdot X}}

where \beta_k represents the vector of regression coefficients for category k. The lowest category value serves as the reference category (with coefficients set to zero), and the model returns coefficients for all other categories relative to this reference.

This implementation uses the MNLogit class from the statsmodels library. Model parameters are estimated via maximum likelihood estimation, and the function returns coefficients, standard errors, z-statistics, p-values, and confidence intervals for each parameter-category combination. Model fit statistics include pseudo R-squared (McFadden’s), log-likelihood, AIC, and BIC.

Common applications include predicting consumer brand choices, transportation mode selection, voting behavior, and disease classification—any scenario where the outcome falls into one of several unordered categories. The model assumes independence of irrelevant alternatives (IIA), meaning the relative odds between any two choices are independent of other available alternatives. For more theoretical background, see the Wikipedia article on multinomial logistic regression.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=MULTINOMIAL_LOGIT(y, x, fit_intercept, alpha)

y (list[list], required): Categorical dependent variable as a column vector of integer category values (0, 1, 2, …).
x (list[list], required): Independent variables (predictors) as a matrix where each column represents a different predictor.
fit_intercept (bool, optional, default: true): If True, adds an intercept term to the model.
alpha (float, optional, default: 0.05): Significance level for confidence intervals (0 < alpha < 1).

Returns (list[list]): 2D list with multinomial results, or error string.

Example 1: Basic three categories with intercept

Inputs:

y	x
0	1
0	1.5
0	2
1	2.5
0	3
1	3.5
1	4
2	4.5
1	5
2	5.5
2	6
0	2.2

Excel formula:

=MULTINOMIAL_LOGIT({0;0;0;1;0;1;1;2;1;2;2;0}, {1;1.5;2;2.5;3;3.5;4;4.5;5;5.5;6;2.2})

Expected output:

category	parameter	coefficient	std_error	z_statistic	p_value	ci_lower	ci_upper
1	const	-7.94005	5.11694	-1.55172	0.120729	-17.9691	2.08896
1	x0	2.80405	1.83812	1.5255	0.127133	-0.798588	6.4067
2	const	-19.5896	10.0261	-1.95387	0.0507169	-39.2403	0.0611246
2	x0	5.25737	2.59552	2.02555	0.0428108	0.170234	10.3445
pseudo_r_squared		0.600763
log_likelihood		-5.16241
aic		18.3248
bic		20.2644

Example 2: Model without intercept

Inputs:

y	x	fit_intercept
0	1	false
0	1.5
0	2
1	2.5
0	3
1	3.5
1	4
2	4.5
1	5
2	5.5
2	6
0	2.2

Excel formula:

=MULTINOMIAL_LOGIT({0;0;0;1;0;1;1;2;1;2;2;0}, {1;1.5;2;2.5;3;3.5;4;4.5;5;5.5;6;2.2}, FALSE)

Expected output:

category	parameter	coefficient	std_error	z_statistic	p_value	ci_lower	ci_upper
1	x0	0.108064	0.206153	0.524192	0.600145	-0.295989	0.512117
2	x0	0.123565	0.203502	0.607194	0.543722	-0.275292	0.522423
pseudo_r_squared		-0.00259273
log_likelihood		-12.9642
aic		29.9284
bic		30.8982

Example 3: Custom confidence level (90%)

Inputs:

y	x	alpha
0	1	0.1
0	1.5
0	2
1	2.5
0	3
1	3.5
1	4
2	4.5
1	5
2	5.5
2	6
0	2.2

Excel formula:

=MULTINOMIAL_LOGIT({0;0;0;1;0;1;1;2;1;2;2;0}, {1;1.5;2;2.5;3;3.5;4;4.5;5;5.5;6;2.2}, 0.1)

Expected output:

category	parameter	coefficient	std_error	z_statistic	p_value	ci_lower	ci_upper
1	const	-7.94005	5.11694	-1.55172	0.120729	-16.3567	0.476563
1	x0	2.80405	1.83812	1.5255	0.127133	-0.219378	5.82749
2	const	-19.5896	10.0261	-1.95387	0.0507169	-36.081	-3.09819
2	x0	5.25737	2.59552	2.02555	0.0428108	0.98811	9.52662
pseudo_r_squared		0.600763
log_likelihood		-5.16241
aic		18.3248
bic		20.2644

Example 4: All arguments specified (99% CI)

Inputs:

y	x	fit_intercept	alpha
0	1	true	0.01
0	1.5
0	2
1	2.5
0	3
1	3.5
1	4
2	4.5
1	5
2	5.5
2	6
0	2.2

Excel formula:

=MULTINOMIAL_LOGIT({0;0;0;1;0;1;1;2;1;2;2;0}, {1;1.5;2;2.5;3;3.5;4;4.5;5;5.5;6;2.2}, TRUE, 0.01)

Expected output:

category	parameter	coefficient	std_error	z_statistic	p_value	ci_lower	ci_upper
1	const	-7.94005	5.11694	-1.55172	0.120729	-21.1204	5.24031
1	x0	2.80405	1.83812	1.5255	0.127133	-1.93062	7.53873
2	const	-19.5896	10.0261	-1.95387	0.0507169	-45.4151	6.23583
2	x0	5.25737	2.59552	2.02555	0.0428108	-1.42826	11.943
pseudo_r_squared		0.600763
log_likelihood		-5.16241
aic		18.3248
bic		20.2644

Python Code

Show Code

import math
from statsmodels.discrete.discrete_model import MNLogit as statsmodels_mnlogit

def multinomial_logit(y, x, fit_intercept=True, alpha=0.05):
    """
    Fits a multinomial logistic regression model for multi-category outcomes.

    See: https://www.statsmodels.org/stable/generated/statsmodels.discrete.discrete_model.MNLogit.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        y (list[list]): Categorical dependent variable as a column vector of integer category values (0, 1, 2, ...).
        x (list[list]): Independent variables (predictors) as a matrix where each column represents a different predictor.
        fit_intercept (bool, optional): If True, adds an intercept term to the model. Default is True.
        alpha (float, optional): Significance level for confidence intervals (0 < alpha < 1). Default is 0.05.

    Returns:
        list[list]: 2D list with multinomial results, or error string.
    """
    def to2d(val):
        return [[val]] if not isinstance(val, list) else val

    def validate_numeric(val, name):
        if not isinstance(val, (int, float)):
          return f"Error: Invalid input: {name} must be a number."
        if math.isnan(val) or math.isinf(val):
          return f"Error: Invalid input: {name} must be finite."
        return None

    try:
      # Normalize inputs
      y = to2d(y)
      x = to2d(x)

      # Validate y is a column vector
      if not isinstance(y, list) or len(y) == 0:
        return "Error: Invalid input: y must be a non-empty 2D list."
      if not all(isinstance(row, list) and len(row) == 1 for row in y):
        return "Error: Invalid input: y must be a column vector (2D list with one column)."

      # Validate x is a matrix
      if not isinstance(x, list) or len(x) == 0:
        return "Error: Invalid input: x must be a non-empty 2D list."
      if not all(isinstance(row, list) for row in x):
        return "Error: Invalid input: x must be a 2D list."

      num_rows_x = len(x)
      num_cols_x = len(x[0]) if num_rows_x > 0 else 0
      if num_cols_x == 0:
        return "Error: Invalid input: x must have at least one column."
      if not all(len(row) == num_cols_x for row in x):
        return "Error: Invalid input: x must have consistent row lengths."

      # Check y and x have same number of rows
      if len(y) != num_rows_x:
        return "Error: Invalid input: y and x must have the same number of rows."

      # Validate fit_intercept
      if not isinstance(fit_intercept, bool):
        return "Error: Invalid input: fit_intercept must be a boolean."

      # Validate alpha
      err = validate_numeric(alpha, "alpha")
      if err:
        return err
      if alpha <= 0 or alpha >= 1:
        return "Error: Invalid input: alpha must be between 0 and 1."

      # Extract y values
      y_flat = []
      for row in y:
        val = row[0]
        err = validate_numeric(val, "y value")
        if err:
          return err
        y_flat.append(val)

      # Check y values are integers
      for val in y_flat:
        if val != int(val):
          return "Error: Invalid input: y must contain integer category values."

      # Extract x values
      x_matrix = []
      for row in x:
        x_row = []
        for val in row:
          err = validate_numeric(val, "x value")
          if err:
            return err
          x_row.append(float(val))
        x_matrix.append(x_row)

      # Add intercept if requested
      if fit_intercept:
        x_matrix = [[1.0] + row for row in x_matrix]
        param_names = ["const"] + [f"x{i}" for i in range(num_cols_x)]
      else:
        param_names = [f"x{i}" for i in range(num_cols_x)]

      # Fit the model
      try:
        model = statsmodels_mnlogit(y_flat, x_matrix)
        result = model.fit(disp=0)
      except Exception as exc:  # noqa: BLE001
        return f"Error: Model fitting error: {exc}"

      # Extract results
      output = [["category", "parameter", "coefficient", "std_error", "z_statistic", "p_value", "ci_lower", "ci_upper"]]

      # Get confidence intervals
      try:
        conf_int = result.conf_int(alpha=alpha)
      except Exception as exc:  # noqa: BLE001
        return f"Error: Confidence interval error: {exc}"

      # Get categories (excluding reference category which is the first/lowest category)
      categories = sorted(set(y_flat))
      non_ref_categories = categories[1:]

      # Extract parameters for each non-reference category
      # Result arrays are organized as [param_idx, category_idx] or [category_idx, param_idx]
      # Need to check the actual structure
      params_array = result.params
      bse_array = result.bse
      tvalues_array = result.tvalues
      pvalues_array = result.pvalues

      for cat_idx, cat in enumerate(non_ref_categories):
        for param_idx, param_name in enumerate(param_names):
          # Access values
          # params, bse, tvalues, pvalues: [num_params, num_categories-1]
          # conf_int: [num_categories-1, num_params, 2]
          coef = params_array[param_idx, cat_idx] if len(params_array.shape) > 1 else params_array[param_idx]
          std_err = bse_array[param_idx, cat_idx] if len(bse_array.shape) > 1 else bse_array[param_idx]
          z_stat = tvalues_array[param_idx, cat_idx] if len(tvalues_array.shape) > 1 else tvalues_array[param_idx]
          p_val = pvalues_array[param_idx, cat_idx] if len(pvalues_array.shape) > 1 else pvalues_array[param_idx]
          ci_low = conf_int[cat_idx, param_idx, 0] if len(conf_int.shape) > 2 else conf_int[param_idx, 0]
          ci_high = conf_int[cat_idx, param_idx, 1] if len(conf_int.shape) > 2 else conf_int[param_idx, 1]

          output.append([
            float(cat),
            param_name,
            float(coef),
            float(std_err),
            float(z_stat),
            float(p_val),
            float(ci_low),
            float(ci_high)
          ])

      # Add model statistics
      output.append(["pseudo_r_squared", "", float(result.prsquared), "", "", "", "", ""])
      output.append(["log_likelihood", "", float(result.llf), "", "", "", "", ""])
      output.append(["aic", "", float(result.aic), "", "", "", "", ""])
      output.append(["bic", "", float(result.bic), "", "", "", "", ""])

      return output
    except Exception as exc:  # noqa: BLE001
      return f"Error: {exc}"

Online Calculator

y *

Categorical dependent variable as a column vector of integer category values (0, 1, 2, ...).

x *

Independent variables (predictors) as a matrix where each column represents a different predictor.

fit_intercept

If True, adds an intercept term to the model.

alpha

Significance level for confidence intervals (0 < alpha < 1).