SPECIFICATION_TESTS

Overview

The SPECIFICATION_TESTS function performs a suite of diagnostic tests on an Ordinary Least Squares (OLS) regression to detect model misspecification. Model specification errors occur when the functional form of the regression model does not adequately capture the true relationship between variables, leading to biased or inefficient estimates.

This implementation uses the statsmodels library’s diagnostic module to run three complementary tests:

Ramsey’s RESET Test (Regression Equation Specification Error Test) detects neglected nonlinearity by augmenting the original regression with powers of the fitted values. The null hypothesis is that the model is correctly specified as linear. A significant result (p-value < 0.05) suggests the model may be missing polynomial or interaction terms. See the linear_reset documentation for details.

Rainbow Test evaluates linearity by comparing the fit of the full model against a model using only a central subset of the data (ordered by the independent variable). Under the null hypothesis, the relationship is linear and both fits should be similar. Rejection indicates potential nonlinearity or structural breaks in the data. The test has power against many forms of nonlinearity. See the linear_rainbow documentation.

Harvey-Collier Test is a t-test on the mean of recursive OLS residuals. Under correct specification, these residuals should have zero mean. The test computes residuals by sequentially adding observations and re-estimating the model, detecting systematic patterns that indicate specification errors. See the linear_harvey_collier documentation.

Each test returns an F-statistic or t-statistic along with a p-value. A p-value greater than 0.05 (the conventional threshold) suggests the null hypothesis of correct specification cannot be rejected, while a smaller p-value indicates potential misspecification. The function provides both a conclusion and interpretation for each test to guide remedial action.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=SPECIFICATION_TESTS(y, x, fit_intercept)

y (list[list], required): Dependent variable as a column vector (Nx1 matrix)
x (list[list], required): Independent variables (predictors) as a matrix where each column is a predictor
fit_intercept (bool, optional, default: true): Whether to add an intercept term to the model

Returns (list[list]): 2D list with specification tests

Example 1: Specification tests for a linear trend

Inputs:

y	x
3.397	1
4.889	2
7.518	3
10.22	4
10.81	5
12.81	6
16.26	7
17.61	8
18.62	9
21.43	10
22.63	11
24.63	12

Excel formula:

=SPECIFICATION_TESTS({3.397;4.889;7.518;10.22;10.81;12.81;16.26;17.61;18.62;21.43;22.63;24.63}, {1;2;3;4;5;6;7;8;9;10;11;12})

Expected output:

test_name	statistic	p_value	conclusion	interpretation
RESET	0.613768	0.564959	No misspecification	Model appears linear
Rainbow	0.254951	0.932889	Linear	Model specification appears adequate
Harvey-Collier	-0.928892	0.380119	Correctly specified	No evidence of specification error

Example 2: Specification tests with noise

Inputs:

y	x
2.1	1
3.9	2
6.2	3
7.8	4
10.1	5
12	6
13.9	7
16.2	8

Excel formula:

=SPECIFICATION_TESTS({2.1;3.9;6.2;7.8;10.1;12;13.9;16.2}, {1;2;3;4;5;6;7;8})

Expected output:

test_name	statistic	p_value	conclusion	interpretation
RESET	0.382658	0.70459	No misspecification	Model appears linear
Rainbow	0.427998	0.787289	Linear	Model specification appears adequate
Harvey-Collier	0.291346	0.78527	Correctly specified	No evidence of specification error

Example 3: Specification tests without intercept

Inputs:

y	x	fit_intercept
2.496	1	false
4.178	2
6.037	3
8.435	4
9.625	5
11.63	6
14.84	7
15.84	8
17.25	9
19.87	10
21.86	11
23.63	12

Excel formula:

=SPECIFICATION_TESTS({2.496;4.178;6.037;8.435;9.625;11.63;14.84;15.84;17.25;19.87;21.86;23.63}, {1;2;3;4;5;6;7;8;9;10;11;12}, FALSE)

Expected output:

test_name	statistic	p_value	conclusion	interpretation
RESET	0.666243	0.537242	No misspecification	Model appears linear
Rainbow	0.171644	0.973367	Linear	Model specification appears adequate
Harvey-Collier	-1.03717	0.329987	Correctly specified	No evidence of specification error

Example 4: Specification tests with explicit intercept flag

Inputs:

y	x	fit_intercept
2.148	1	true
3.889	2
5.518	3
7.761	4
9.883	5
11.88	6
14.79	7
16.38	8
17.77	9
20.27	10

Excel formula:

=SPECIFICATION_TESTS({2.148;3.889;5.518;7.761;9.883;11.88;14.79;16.38;17.77;20.27}, {1;2;3;4;5;6;7;8;9;10}, TRUE)

Expected output:

test_name	statistic	p_value	conclusion	interpretation
RESET	1.58898	0.279393	No misspecification	Model appears linear
Rainbow	1.53929	0.383645	Linear	Model specification appears adequate
Harvey-Collier	1.06877	0.326276	Correctly specified	No evidence of specification error

Python Code

Show Code

import numpy as np
from statsmodels.regression.linear_model import OLS
from statsmodels.stats.diagnostic import linear_reset, linear_rainbow, linear_harvey_collier

def specification_tests(y, x, fit_intercept=True):
    """
    Performs regression specification tests to detect model misspecification.

    See: https://www.statsmodels.org/stable/stats.html#residual-diagnostics-and-specification-tests

    This example function is provided as-is without any representation of accuracy.

    Args:
        y (list[list]): Dependent variable as a column vector (Nx1 matrix)
        x (list[list]): Independent variables (predictors) as a matrix where each column is a predictor
        fit_intercept (bool, optional): Whether to add an intercept term to the model Default is True.

    Returns:
        list[list]: 2D list with specification tests
    """
    def to2d(val):
        return [[val]] if not isinstance(val, list) else val

    def validate_2d_numeric(arr, name):
        arr_2d = to2d(arr)
        if not isinstance(arr_2d, list):
            return f"Error: {name} must be a 2D list."
        if not arr_2d:
            return f"Error: {name} cannot be empty."
        for i, row in enumerate(arr_2d):
            if not isinstance(row, list):
                return f"Error: {name} row {i} must be a list."
            if not row:
                return f"Error: {name} row {i} cannot be empty."
            for j, val in enumerate(row):
                if not isinstance(val, (int, float)):
                    return f"Error: {name}[{i}][{j}] must be a number."
                if np.isnan(val) or np.isinf(val):
                    return f"Error: {name}[{i}][{j}] must be finite."
        return arr_2d
    try:
        # Validate inputs
        y_2d = validate_2d_numeric(y, "y")
        if isinstance(y_2d, str):
            return y_2d

        x_2d = validate_2d_numeric(x, "x")
        if isinstance(x_2d, str):
            return x_2d

        if not isinstance(fit_intercept, bool):
            return "Error: fit_intercept must be a boolean."

        # Convert to numpy arrays
        y_array = np.array(y_2d, dtype=float)
        x_array = np.array(x_2d, dtype=float)

        # Ensure y is a column vector
        if y_array.ndim != 2 or y_array.shape[1] != 1:
            return "Error: y must be a column vector (2D list with one column)."

        n_obs = y_array.shape[0]
        if x_array.shape[0] != n_obs:
            return "Error: y and x must have the same number of rows."

        if n_obs < 3:
            return "Error: need at least 3 observations."

        # Flatten y to 1D for statsmodels
        y_flat = y_array.flatten()

        # Add intercept if requested
        if fit_intercept:
            x_with_const = np.column_stack([np.ones(n_obs), x_array])
        else:
            x_with_const = x_array

        if x_with_const.shape[1] >= n_obs:
            return "Error: number of parameters must be less than number of observations."

        # Fit OLS model
        model = OLS(y_flat, x_with_const)
        results = model.fit()

        # Initialize output
        output = [
            ['test_name', 'statistic', 'p_value', 'conclusion', 'interpretation']
        ]

        # RESET test (tests for omitted nonlinear terms)
        try:
            reset_result = linear_reset(results, use_f=True)
            reset_stat = float(reset_result.fvalue)
            reset_p = float(reset_result.pvalue)
            if np.isnan(reset_stat) or np.isnan(reset_p):
                output.append(['RESET', 'N/A', 'N/A', 'Unable to compute', 'Insufficient variation in data'])
            else:
                reset_conclusion = "No misspecification" if reset_p > 0.05 else "Misspecification detected"
                reset_interp = "Model appears linear" if reset_p > 0.05 else "Model may be missing nonlinear terms"
                output.append(['RESET', reset_stat, reset_p, reset_conclusion, reset_interp])
        except Exception as e:
            output.append(['RESET', 'Error', 'Error', str(e), 'Unable to compute'])

        # Rainbow test (tests for linearity)
        try:
            rainbow_stat, rainbow_p = linear_rainbow(results)
            rainbow_stat = float(rainbow_stat)
            rainbow_p = float(rainbow_p)
            if np.isnan(rainbow_stat) or np.isnan(rainbow_p):
                output.append(['Rainbow', 'N/A', 'N/A', 'Unable to compute', 'Insufficient variation in data'])
            else:
                rainbow_conclusion = "Linear" if rainbow_p > 0.05 else "Non-linear"
                rainbow_interp = "Model specification appears adequate" if rainbow_p > 0.05 else "Model may be non-linear"
                output.append(['Rainbow', rainbow_stat, rainbow_p, rainbow_conclusion, rainbow_interp])
        except Exception as e:
            output.append(['Rainbow', 'Error', 'Error', str(e), 'Unable to compute'])

        # Harvey-Collier test (tests for specification errors)
        try:
            hc_result = linear_harvey_collier(results)
            hc_stat = float(hc_result.statistic)
            hc_p = float(hc_result.pvalue)
            if np.isnan(hc_stat) or np.isnan(hc_p):
                output.append(['Harvey-Collier', 'N/A', 'N/A', 'Unable to compute', 'Insufficient variation in data'])
            else:
                hc_conclusion = "Correctly specified" if hc_p > 0.05 else "Specification error"
                hc_interp = "No evidence of specification error" if hc_p > 0.05 else "Model may have specification errors"
                output.append(['Harvey-Collier', hc_stat, hc_p, hc_conclusion, hc_interp])
        except Exception as e:
            output.append(['Harvey-Collier', 'Error', 'Error', str(e), 'Unable to compute'])

        return output
    except Exception as e:
        return f"Error: {e}"

Online Calculator

y *

Dependent variable as a column vector (Nx1 matrix)

x *

Independent variables (predictors) as a matrix where each column is a predictor

fit_intercept

Whether to add an intercept term to the model