Summary Statistics
Overview
Summary statistics are used to summarize a set of observations, in order to communicate the largest amount of information as simply as possible. They are the first step in any data analysis, providing a snapshot of the data’s central tendency, dispersion, and shape.
Common summary statistics fall into three categories:
- Central Tendency: Where is the “center” of the data? (Mean, Median, Mode)
- Dispersion: How spread out is the data? (Variance, Standard Deviation, Interquartile Range)
- Shape: Is the distribution symmetric or skewed? Peaked or flat? (Skewness, Kurtosis)
Key Functions
- DESCRIBE: Computes several descriptive statistics (nobs, minmax, mean, variance, skewness, kurtosis) in a single call.
- SKEW and KURTOSIS: Quantify the asymmetry and “tailedness” of the distribution.
- MODE: Identifies the most frequent value(s) in a dataset.
- SEM: Calculates the standard error of the mean, essential for error bars.
Native Excel Capabilities
Excel computes basic statistics easily (AVERAGE, MEDIAN, STDEV.S, VAR.S). However, Python provides enhanced capabilities:
- Higher-Order Moments: Excel provides
SKEWandKURT, butscipy.statsoffers adjustments for bias and different definitions (Fisher vs Pearson). - Multi-dimensional Data: Python functions easily operate along specific axes of matrix data (e.g., “compute the mean of every column”).
- Comprehensive Reporting: The
DESCRIBEfunction returns a full statistical summary object, whereas getting the same info in Excel requires running the Analysis ToolPak “Descriptive Statistics” tool, which creates a static output that doesn’t update when data changes.
Tools
| Tool | Description |
|---|---|
| DESCRIBE | Compute descriptive statistics using scipy.stats.describe module. |
| EFFECT_SIZES | Computes effect size measures for comparing two groups. |
| EXPECTILE | Calculates the expectile of a dataset using scipy.stats.expectile. |
| GMEAN | Compute the geometric mean of the input data, flattening the input and ignoring non-numeric values. |
| HMEAN | Calculates the harmonic mean of the input data, flattening the input and ignoring non-numeric values. |
| KURTOSIS | Compute the kurtosis (Fisher or Pearson) of a dataset. |
| MODE | Returns the modal (most common) value in the passed array. Wraps scipy.stats.mode to flatten the input, ignore non-numeric values, and always return a single mode (the smallest if multiple). If no mode is found (all values occur only once), returns an error. |
| MOMENT | Calculates the nth moment about the mean for a sample. |
| PMEAN | Computes the power mean (generalized mean) of the input data for a given power p. |
| SKEWNESS | Calculate the skewness of a dataset. |