HIERARCHICAL_CLUSTER
Overview
Performs hierarchical (agglomerative) clustering on numeric data and returns a dendrogram plot as a base64-encoded PNG image. This function is designed for use in Excel, where you can pass a 2D list or a single column of numbers. By default, Ward’s method is used for clustering, but you may specify other linkage methods. The result is visualized as a dendrogram.
Ward’s method minimizes the total within-cluster variance. At each step, the pair of clusters with the minimum increase in total within-cluster variance after merging are combined. The increase in variance when merging clusters and is:
See scipy.cluster.hierarchy documentation for more details on the available methods.
This example function is provided as-is without any representation of accuracy.
Usage
To use the function in Excel:
=HIERARCHICAL_CLUSTER(data, [method])
data
(2D list, required): Numeric data for clustering (one or more columns).method
(string, optional, default=“ward”): Linkage method. One of"single"
,"complete"
,"average"
,"weighted"
,"centroid"
,"median"
, or"ward"
.
The function returns a base64-encoded PNG image of the dendrogram as a string. If the calculation fails, an error message string is returned.
Examples
Example 1: Cluster a List of Values (Default: Ward)
Sample input data (Excel range A1:A10
):
Value |
---|
9.6 |
9.8 |
10 |
10.4 |
10.8 |
11 |
11.2 |
12 |
13 |
14 |
In Excel:
=HIERARCHICAL_CLUSTER(A1:A10)
Expected output: A base64-encoded PNG string (truncated):
"..."
Example 2: Cluster with Complete Linkage
=HIERARCHICAL_CLUSTER(A1:A10, "complete")
Expected output: A base64-encoded PNG string (truncated):
"..."
Python Code
options = {"insert_only":True}
import numpy as np
from scipy.cluster.hierarchy import linkage, dendrogram
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import io
import base64
# The function below performs hierarchical clustering and returns a dendrogram image as a base64 string.
def hierarchical_cluster(data, method="ward"):
"""
Performs hierarchical (agglomerative) clustering on numeric data and returns a dendrogram as a base64-encoded PNG image or an error message.
Args:
data: 2D list of float, required. Numeric data for clustering (Excel range or list).
method: str, optional, default="ward". Linkage method for clustering. One of 'single', 'complete', 'average', 'weighted', 'centroid', 'median', 'ward'.
Returns:
str: Base64-encoded PNG image of the dendrogram, or error message if calculation fails.
This example function is provided as-is without any representation of accuracy.
"""
# Convert input to numpy array, flatten if 1D
try:
arr = np.array(data, dtype=float)
except Exception:
# Remove non-numeric rows manually
arr_clean = []
for row in data:
try:
arr_clean.append([float(x) for x in row])
except Exception:
continue
arr = np.array(arr_clean, dtype=float)
if arr.size == 0:
return "Error: Not enough data."
if arr.ndim == 1:
arr = arr.reshape(-1, 1)
elif arr.ndim == 2 and arr.shape[1] == 1:
arr = arr
elif arr.ndim == 2:
arr = arr.astype(float)
else:
return "Error: Invalid input data."
# Remove non-numeric rows
arr = arr[np.isfinite(arr).all(axis=1)]
if arr.shape[0] < 2:
return "Error: Not enough data."
# Perform hierarchical clustering
try:
linkage_matrix = linkage(arr, method=method)
except Exception:
try:
linkage_matrix = linkage(arr, method="ward")
except Exception:
return "Error: Clustering failed."
# Plot dendrogram
plt.figure(figsize=(8, 4))
dendrogram(linkage_matrix)
plt.title(f"Hierarchical Clustering Dendrogram ({method})")
plt.xlabel("Sample Index")
plt.ylabel("Distance")
buf = io.BytesIO()
plt.tight_layout()
plt.savefig(buf, format='png')
plt.close()
img_b64 = base64.b64encode(buf.getvalue()).decode('utf-8')
return f"data:image/png;base64,{img_b64}"
Live Notebook
Edit this function in a live notebook .