ORDERED_LOGIT

Overview

The ORDERED_LOGIT function fits an ordered logistic regression model (also known as the proportional odds model) for ordinal dependent variables. This type of regression is appropriate when the outcome has naturally ordered categories—such as survey responses ranging from “strongly disagree” to “strongly agree,” bond ratings, or health status levels—where the ordering matters but the intervals between categories are not assumed to be equal.

Ordered logit is based on a latent variable framework. The model assumes an unobserved continuous variable y^* underlies the observed categorical responses:

y^* = X\beta + \varepsilon

where X represents the predictor variables, \beta are the regression coefficients, and \varepsilon follows a standard logistic distribution. The observed ordinal outcome y is determined by where y^* falls relative to a set of cut points (thresholds) \mu_1, \mu_2, \ldots, \mu_{K-1} for K categories:

y = k \quad \text{if} \quad \mu_{k-1} < y^* \leq \mu_k

The probability of observing category k is:

P(y = k | X) = F(\mu_k - X\beta) - F(\mu_{k-1} - X\beta)

where F is the cumulative distribution function of the logistic distribution.

This implementation uses the OrderedModel class from the statsmodels library. The function returns coefficient estimates for each predictor along with cut points that separate the ordered categories, standard errors, z-statistics, p-values, and confidence intervals. Model fit statistics include the pseudo R-squared, log-likelihood, AIC, and BIC.

Common applications include analyzing Likert-scale survey data, credit ratings, educational attainment levels, and any scenario where outcomes fall into ranked categories. For theoretical background, see the Wikipedia article on ordered logit and the original work by McCullagh (1980).

This example function is provided as-is without any representation of accuracy.

Excel Usage

=ORDERED_LOGIT(y, x, fit_intercept, alpha)
  • y (list[list], required): Ordinal dependent variable as a column vector with integer category values (0, 1, 2, …) representing ordered categories.
  • x (list[list], required): Independent variables (predictors) as a matrix where each column represents a different predictor variable.
  • fit_intercept (bool, optional, default: true): Reserved for API consistency; has no effect since ordered models use cut points instead of intercepts.
  • alpha (float, optional, default: 0.05): Significance level for confidence intervals, between 0 and 1.

Returns (list[list]): 2D list with ordered logit results, or error string.

Example 1: Basic three-category model with one predictor

Inputs:

y x
0 1
0 1.2
0 1.4
0 1.6
0 1.8
0 2
1 1.8
1 2
1 2.2
1 2.4
1 2.6
1 2.8
1 3
1 3.2
2 2.8
2 3
2 3.2
2 3.4
2 3.6
2 3.8

Excel formula:

=ORDERED_LOGIT({0;0;0;0;0;0;1;1;1;1;1;1;1;1;2;2;2;2;2;2}, {1;1.2;1.4;1.6;1.8;2;1.8;2;2.2;2.4;2.6;2.8;3;3.2;2.8;3;3.2;3.4;3.6;3.8})

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper
cut_0/1 6.0986 2.44115 2.49825 0.0124808 1.31404 10.8832
cut_1/2 11.5786 4.74204 2.4417 0.0146181 2.28443 20.8729
x0 1.90598 0.43239 4.40801 0.0000104326 1.05851 2.75345
pseudo_r_squared 0.610148
log_likelihood -8.4902
aic 22.9804
bic 25.9676
Example 2: Model without intercept using same data

Inputs:

y x fit_intercept
0 1 false
0 1.2
0 1.4
0 1.6
0 1.8
0 2
1 1.8
1 2
1 2.2
1 2.4
1 2.6
1 2.8
1 3
1 3.2
2 2.8
2 3
2 3.2
2 3.4
2 3.6
2 3.8

Excel formula:

=ORDERED_LOGIT({0;0;0;0;0;0;1;1;1;1;1;1;1;1;2;2;2;2;2;2}, {1;1.2;1.4;1.6;1.8;2;1.8;2;2.2;2.4;2.6;2.8;3;3.2;2.8;3;3.2;3.4;3.6;3.8}, FALSE)

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper
cut_0/1 6.0986 2.44115 2.49825 0.0124808 1.31404 10.8832
cut_1/2 11.5786 4.74204 2.4417 0.0146181 2.28443 20.8729
x0 1.90598 0.43239 4.40801 0.0000104326 1.05851 2.75345
pseudo_r_squared 0.610148
log_likelihood -8.4902
aic 22.9804
bic 25.9676
Example 3: Custom significance level (90% CI)

Inputs:

y x alpha
0 1 0.1
0 1.2
0 1.4
0 1.6
0 1.8
0 2
1 1.8
1 2
1 2.2
1 2.4
1 2.6
1 2.8
1 3
1 3.2
2 2.8
2 3
2 3.2
2 3.4
2 3.6
2 3.8

Excel formula:

=ORDERED_LOGIT({0;0;0;0;0;0;1;1;1;1;1;1;1;1;2;2;2;2;2;2}, {1;1.2;1.4;1.6;1.8;2;1.8;2;2.2;2.4;2.6;2.8;3;3.2;2.8;3;3.2;3.4;3.6;3.8}, 0.1)

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper
cut_0/1 6.0986 2.44115 2.49825 0.0124808 2.08327 10.1139
cut_1/2 11.5786 4.74204 2.4417 0.0146181 3.77869 19.3786
x0 1.90598 0.43239 4.40801 0.0000104326 1.19476 2.61719
pseudo_r_squared 0.610148
log_likelihood -8.4902
aic 22.9804
bic 25.9676
Example 4: Multiple predictors with all arguments specified

Inputs:

y x fit_intercept alpha
0 1 1 true 0.05
0 1.2 0.9
0 1.4 1.1
0 1.6 0.8
0 1.8 1.2
0 2 0.7
1 1.8 1.3
1 2 1.4
1 2.2 0.9
1 2.4 1.5
1 2.6 1
1 2.8 1.6
1 3 1.1
1 3.2 1.7
2 2.8 1.8
2 3 1.2
2 3.2 1.9
2 3.4 1.3
2 3.6 2
2 3.8 1.4

Excel formula:

=ORDERED_LOGIT({0;0;0;0;0;0;1;1;1;1;1;1;1;1;2;2;2;2;2;2}, {1,1;1.2,0.9;1.4,1.1;1.6,0.8;1.8,1.2;2,0.7;1.8,1.3;2,1.4;2.2,0.9;2.4,1.5;2.6,1;2.8,1.6;3,1.1;3.2,1.7;2.8,1.8;3,1.2;3.2,1.9;3.4,1.3;3.6,2;3.8,1.4}, TRUE, 0.05)

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper
cut_0/1 6.21563 2.92672 2.12375 0.0336909 0.479362 11.9519
cut_1/2 3.65906 2.55509 1.43207 0.152125 -1.34883 8.66695
x0 15.7804 6.84997 2.30371 0.0212387 2.35467 29.2061
x1 2.11141 0.451177 4.67978 0.00000287178 1.22712 2.9957
pseudo_r_squared 0.667735
log_likelihood -7.23606
aic 22.4721
bic 26.455

Python Code

Show Code
import math
import numpy as np
from statsmodels.miscmodels.ordinal_model import OrderedModel as statsmodels_ordered_model

def ordered_logit(y, x, fit_intercept=True, alpha=0.05):
    """
    Fits an ordered logistic regression model for ordinal outcomes.

    See: https://www.statsmodels.org/stable/generated/statsmodels.miscmodels.ordinal_model.OrderedModel.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        y (list[list]): Ordinal dependent variable as a column vector with integer category values (0, 1, 2, ...) representing ordered categories.
        x (list[list]): Independent variables (predictors) as a matrix where each column represents a different predictor variable.
        fit_intercept (bool, optional): Reserved for API consistency; has no effect since ordered models use cut points instead of intercepts. Default is True.
        alpha (float, optional): Significance level for confidence intervals, between 0 and 1. Default is 0.05.

    Returns:
        list[list]: 2D list with ordered logit results, or error string.
    """
    def to2d(val):
        return [[val]] if not isinstance(val, list) else val

    def validate_numeric(val, name):
        if not isinstance(val, (int, float)):
          return f"Error: Invalid input: {name} must be a number."
        if math.isnan(val) or math.isinf(val):
          return f"Error: Invalid input: {name} must be finite."
        return None

    try:
      # Normalize inputs
      y = to2d(y)
      x = to2d(x)

      # Validate y is a column vector
      if not isinstance(y, list) or len(y) == 0:
        return "Error: Invalid input: y must be a non-empty 2D list."
      if not all(isinstance(row, list) and len(row) == 1 for row in y):
        return "Error: Invalid input: y must be a column vector (2D list with one column)."

      # Validate x is a matrix
      if not isinstance(x, list) or len(x) == 0:
        return "Error: Invalid input: x must be a non-empty 2D list."
      if not all(isinstance(row, list) for row in x):
        return "Error: Invalid input: x must be a 2D list."

      num_rows_x = len(x)
      num_cols_x = len(x[0]) if num_rows_x > 0 else 0
      if num_cols_x == 0:
        return "Error: Invalid input: x must have at least one column."
      if not all(len(row) == num_cols_x for row in x):
        return "Error: Invalid input: x must have consistent row lengths."

      # Check y and x have same number of rows
      if len(y) != num_rows_x:
        return "Error: Invalid input: y and x must have the same number of rows."

      # Validate fit_intercept
      if not isinstance(fit_intercept, bool):
        return "Error: Invalid input: fit_intercept must be a boolean."

      # Validate alpha
      err = validate_numeric(alpha, "alpha")
      if err:
        return err
      if alpha <= 0 or alpha >= 1:
        return "Error: Invalid input: alpha must be between 0 and 1."

      # Extract y values
      y_flat = []
      for row in y:
        val = row[0]
        err = validate_numeric(val, "y value")
        if err:
          return err
        y_flat.append(val)

      # Check y values are integers
      for val in y_flat:
        if val != int(val):
          return "Error: Invalid input: y must contain integer category values."

      # Extract x values
      x_matrix = []
      for row in x:
        x_row = []
        for val in row:
          err = validate_numeric(val, "x value")
          if err:
            return err
          x_row.append(float(val))
        x_matrix.append(x_row)

      # Convert to numpy arrays
      y_array = np.array(y_flat)
      x_array = np.array(x_matrix)

      # Set parameter names
      param_names = [f"x{i}" for i in range(num_cols_x)]

      # Fit the ordered logit model
      # Note: OrderedModel uses cut points (thresholds) instead of traditional intercepts.
      # The cut points are always estimated and capture what would be the intercept.
      # The fit_intercept parameter is kept for API consistency but has no effect.
      try:
        model = statsmodels_ordered_model(y_array, x_array, distr='logit')
        result = model.fit(disp=0, method='bfgs')
      except Exception as exc:  # noqa: BLE001
        return f"Error: Model fitting error: {exc}"

      # Extract results
      output = [["parameter", "coefficient", "std_error", "z_statistic", "p_value", "ci_lower", "ci_upper"]]

      # Get confidence intervals
      try:
        conf_int = result.conf_int(alpha=alpha)
      except Exception as exc:  # noqa: BLE001
        return f"Error: Confidence interval error: {exc}"

      # Extract cut points (thresholds)
      params = result.params
      std_errors = result.bse
      z_stats = result.tvalues
      p_values = result.pvalues

      # Determine number of categories
      n_categories = len(set(y_flat))
      n_thresholds = n_categories - 1

      # Add threshold parameters
      for i in range(n_thresholds):
        param_name = f"cut_{i}/{i+1}"
        output.append([
          param_name,
          float(params[i]),
          float(std_errors[i]),
          float(z_stats[i]),
          float(p_values[i]),
          float(conf_int[i, 0]),
          float(conf_int[i, 1])
        ])

      # Add predictor parameters
      for i in range(n_thresholds, len(params)):
        param_idx = i - n_thresholds
        param_name = param_names[param_idx]
        output.append([
          param_name,
          float(params[i]),
          float(std_errors[i]),
          float(z_stats[i]),
          float(p_values[i]),
          float(conf_int[i, 0]),
          float(conf_int[i, 1])
        ])

      # Add model statistics
      output.append(["pseudo_r_squared", float(result.prsquared), "", "", "", "", ""])
      output.append(["log_likelihood", float(result.llf), "", "", "", "", ""])
      output.append(["aic", float(result.aic), "", "", "", "", ""])
      output.append(["bic", float(result.bic), "", "", "", "", ""])

      return output
    except Exception as exc:  # noqa: BLE001
      return f"Error: {exc}"

Online Calculator

Ordinal dependent variable as a column vector with integer category values (0, 1, 2, ...) representing ordered categories.
Independent variables (predictors) as a matrix where each column represents a different predictor variable.
Reserved for API consistency; has no effect since ordered models use cut points instead of intercepts.
Significance level for confidence intervals, between 0 and 1.