Skip to Content

HIERARCHICAL_CLUSTER

Overview

Performs hierarchical (agglomerative) clustering on numeric data and returns a dendrogram plot as a base64-encoded PNG image. This function is designed for use in Excel, where you can pass a 2D list or a single column of numbers. By default, Ward’s method is used for clustering, but you may specify other linkage methods. The result is visualized as a dendrogram.

Ward’s method minimizes the total within-cluster variance. At each step, the pair of clusters with the minimum increase in total within-cluster variance after merging are combined. The increase in variance ΔE\Delta E when merging clusters AA and BB is:

ΔE=ABA+BxˉAxˉB2\Delta E = \frac{|A| \cdot |B|}{|A| + |B|} \|\bar{x}_A - \bar{x}_B\|^2

See scipy.cluster.hierarchy documentation for more details on the available methods.

This example function is provided as-is without any representation of accuracy.

Usage

To use the function in Excel:

=HIERARCHICAL_CLUSTER(data, [method])
  • data (2D list, required): Numeric data for clustering (one or more columns).
  • method (string, optional, default=“ward”): Linkage method. One of "single", "complete", "average", "weighted", "centroid", "median", or "ward".

The function returns a base64-encoded PNG image of the dendrogram as a string. If the calculation fails, an error message string is returned.

Examples

Example 1: Cluster a List of Values (Default: Ward)

Sample input data (Excel range A1:A10):

Value
9.6
9.8
10
10.4
10.8
11
11.2
12
13
14

In Excel:

=HIERARCHICAL_CLUSTER(A1:A10)

Expected output: A base64-encoded PNG string (truncated):

"..."

Example 2: Cluster with Complete Linkage

=HIERARCHICAL_CLUSTER(A1:A10, "complete")

Expected output: A base64-encoded PNG string (truncated):

"..."

Python Code

options = {"insert_only":True} import numpy as np from scipy.cluster.hierarchy import linkage, dendrogram import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt import io import base64 # The function below performs hierarchical clustering and returns a dendrogram image as a base64 string. def hierarchical_cluster(data, method="ward"): """ Performs hierarchical (agglomerative) clustering on numeric data and returns a dendrogram as a base64-encoded PNG image or an error message. Args: data: 2D list of float, required. Numeric data for clustering (Excel range or list). method: str, optional, default="ward". Linkage method for clustering. One of 'single', 'complete', 'average', 'weighted', 'centroid', 'median', 'ward'. Returns: str: Base64-encoded PNG image of the dendrogram, or error message if calculation fails. This example function is provided as-is without any representation of accuracy. """ # Convert input to numpy array, flatten if 1D try: arr = np.array(data, dtype=float) except Exception: # Remove non-numeric rows manually arr_clean = [] for row in data: try: arr_clean.append([float(x) for x in row]) except Exception: continue arr = np.array(arr_clean, dtype=float) if arr.size == 0: return "Error: Not enough data." if arr.ndim == 1: arr = arr.reshape(-1, 1) elif arr.ndim == 2 and arr.shape[1] == 1: arr = arr elif arr.ndim == 2: arr = arr.astype(float) else: return "Error: Invalid input data." # Remove non-numeric rows arr = arr[np.isfinite(arr).all(axis=1)] if arr.shape[0] < 2: return "Error: Not enough data." # Perform hierarchical clustering try: linkage_matrix = linkage(arr, method=method) except Exception: try: linkage_matrix = linkage(arr, method="ward") except Exception: return "Error: Clustering failed." # Plot dendrogram plt.figure(figsize=(8, 4)) dendrogram(linkage_matrix) plt.title(f"Hierarchical Clustering Dendrogram ({method})") plt.xlabel("Sample Index") plt.ylabel("Distance") buf = io.BytesIO() plt.tight_layout() plt.savefig(buf, format='png') plt.close() img_b64 = base64.b64encode(buf.getvalue()).decode('utf-8') return f"data:image/png;base64,{img_b64}"

Live Notebook

Edit this function in a live notebook.

Live Demo

Last updated on