JARQUE_BERA
Overview
The JARQUE_BERA function performs a goodness-of-fit test to determine whether sample data have the skewness and kurtosis matching a normal distribution. Named after economists Carlos Jarque and Anil K. Bera, who developed it in 1980, the test is widely used in econometrics and financial analysis to validate normality assumptions required by many statistical models.
The Jarque-Bera test examines two key properties of the data distribution: skewness (asymmetry) and kurtosis (tail heaviness). For a normal distribution, the expected skewness is 0 and the expected excess kurtosis is also 0 (equivalent to a kurtosis of 3). The test statistic quantifies how far the sample deviates from these expected values.
The test statistic JB is calculated as:
JB = \frac{n}{6}\left(S^2 + \frac{1}{4}(K-3)^2\right)
where n is the sample size, S is the sample skewness, and K is the sample kurtosis. The term (K-3) represents the excess kurtosis, measuring deviation from the normal distribution’s kurtosis of 3.
Under the null hypothesis that the data comes from a normal distribution, the JB statistic asymptotically follows a chi-squared distribution with 2 degrees of freedom. A large JB value (far from zero) indicates significant departure from normality. The function returns both the test statistic and the p-value; a small p-value (typically < 0.05) suggests rejecting the null hypothesis of normality.
This implementation uses SciPy’s jarque_bera function from the scipy.stats module. Note that the chi-squared approximation is most reliable for large sample sizes (>2000); for smaller samples, the test may be overly sensitive and produce inflated Type I error rates. For more background, see the Wikipedia article on the Jarque-Bera test or the original paper by Jarque and Bera (1980).
This example function is provided as-is without any representation of accuracy.
Excel Usage
=JARQUE_BERA(data)
data(list[list], required): Sample data to test for normality. Must contain at least two numeric values.
Returns (list[list]): 2D list [[statistic, p_value]], or error message string.
Examples
Example 1: Normally distributed data
Inputs:
| data |
|---|
| 0.1 |
| -0.2 |
| 0.3 |
| 0 |
| 0.2 |
| -0.1 |
Excel formula:
=JARQUE_BERA({0.1;-0.2;0.3;0;0.2;-0.1})
Expected output:
| Result | |
|---|---|
| 0.4023 | 0.8178 |
Example 2: Uniformly distributed data
Inputs:
| data |
|---|
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
Excel formula:
=JARQUE_BERA({1;2;3;4;5;6})
Expected output:
| Result | |
|---|---|
| 0.4023 | 0.8178 |
Example 3: Evenly spaced data
Inputs:
| data |
|---|
| 2.5 |
| 2.7 |
| 2.9 |
| 3.1 |
| 3.3 |
| 3.5 |
Excel formula:
=JARQUE_BERA({2.5;2.7;2.9;3.1;3.3;3.5})
Expected output:
| Result | |
|---|---|
| 0.4023 | 0.8178 |
Example 4: Data with outlier
Inputs:
| data |
|---|
| 0 |
| 0.1 |
| 0.2 |
| 0.3 |
| 0.4 |
| 5 |
Excel formula:
=JARQUE_BERA({0;0.1;0.2;0.3;0.4;5})
Expected output:
| Result | |
|---|---|
| 3.464 | 0.1769 |
Python Code
from scipy.stats import jarque_bera as scipy_jarque_bera
import math
def jarque_bera(data):
"""
Perform the Jarque-Bera goodness of fit test for normality.
See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.jarque_bera.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): Sample data to test for normality. Must contain at least two numeric values.
Returns:
list[list]: 2D list [[statistic, p_value]], or error message string.
"""
def to2d(x):
return [[x]] if not isinstance(x, list) else x
def flatten(arr):
result = []
for row in arr:
if isinstance(row, list):
result.extend(row)
else:
result.append(row)
return result
data = to2d(data)
if not isinstance(data, list) or not all(isinstance(row, list) for row in data):
return "Invalid input: data must be a 2D list."
flat = flatten(data)
values = []
for val in flat:
try:
f = float(val)
if math.isnan(f) or math.isinf(f):
return "Invalid input: data must contain only numeric values."
values.append(f)
except (TypeError, ValueError):
return "Invalid input: data must contain only numeric values."
if len(values) < 2:
return "Invalid input: data must contain at least two numeric values."
try:
result = scipy_jarque_bera(values)
stat = float(result.statistic)
pval = float(result.pvalue)
except Exception as e:
return f"Calculation error: {e}"
if math.isnan(stat) or math.isinf(stat) or math.isnan(pval) or math.isinf(pval):
return "Calculation error: result contains NaN or infinity."
return [[stat, pval]]