Hierarchical Clustering: Building a Data Hierarchy

Hierarchical clustering is an unsupervised machine learning algorithm that builds a hierarchy of clusters. Unlike k-means, it doesn't require you to pre-specify the number of clusters. Instead, it creates a tree-like structure, often visualized as a dendrogram, showing the relationships between data points at different levels of granularity.

The Core Idea: Merging or Splitting Data Points

Hierarchical clustering creates a nested structure of clusters by either merging data points (agglomerative) or splitting clusters (divisive).

Agglomerative clustering starts with each data point as its own cluster and iteratively merges the closest clusters. Divisive clustering starts with all data points in one cluster and recursively splits them.

There are two main approaches to hierarchical clustering:

Agglomerative (Bottom-Up): This is the more common approach. It begins with each data point as a separate cluster. In each step, the two closest clusters are merged until only one cluster remains, encompassing all data points. The 'closeness' between clusters is determined by a linkage criterion.
Divisive (Top-Down): This approach starts with all data points in a single cluster. In each step, the cluster that is most dissimilar internally is split into two, until each data point is in its own cluster, or a stopping criterion is met.

Linkage Criteria: Defining Cluster Proximity

The choice of linkage criterion is crucial as it defines how the distance between clusters is calculated. This impacts the shape and structure of the resulting hierarchy.

Linkage Type	Description	Pros	Cons
Ward	Minimizes the variance of the clusters being merged.	Often produces compact, spherical clusters.	Sensitive to outliers.
Average	Uses the average distance between all pairs of observations in the two clusters.	Less sensitive to outliers than Ward.	Can be computationally more expensive.
Complete (Maximum)	Uses the maximum distance between any two observations in the two clusters.	Tends to produce clusters with similar sizes.	Sensitive to outliers and can create elongated clusters.
Single (Minimum)	Uses the minimum distance between any two observations in the two clusters.	Can create long, chaining clusters.	Very sensitive to noise and outliers.

The Dendrogram: Visualizing the Hierarchy

A dendrogram is a tree diagram that illustrates the arrangement of the clusters produced by hierarchical clustering. The height of the branches indicates the distance at which clusters were merged. By cutting the dendrogram at a certain height, you can obtain a specific number of clusters.

A dendrogram visually represents the hierarchical clustering process. The horizontal axis lists the data points (or clusters), and the vertical axis represents the distance or dissimilarity at which clusters are merged. Branches connect data points or clusters that are merged together. The height of the merge point on the vertical axis indicates the distance at which the merge occurred. To obtain a specific number of clusters, you can draw a horizontal line across the dendrogram at a chosen height; the number of vertical lines intersected by this horizontal line corresponds to the number of clusters.

📚

Text-based content

Library pages focus on text content

Choosing the Number of Clusters

Unlike k-means, hierarchical clustering doesn't require you to specify 'k' beforehand. However, you still need to decide where to 'cut' the dendrogram to obtain your final clusters. This can be done by:

Visual Inspection: Looking for a significant jump in the merge distance on the dendrogram.
Domain Knowledge: Using prior understanding of the data to determine an appropriate number of clusters.
Metrics: Employing cluster validation metrics like the Silhouette Score or Davies-Bouldin Index, although these are more commonly used with algorithms that require a pre-defined 'k'.

Advantages and Disadvantages

Hierarchical clustering is excellent for exploring data structure and identifying nested relationships, but its computational complexity can be a limitation for very large datasets.

Advantages

No need to pre-specify the number of clusters.
Provides a dendrogram, which is useful for understanding data relationships and choosing the number of clusters.
Can capture clusters of arbitrary shapes (depending on linkage).
Useful for exploratory data analysis.

Disadvantages

Computationally expensive, especially for large datasets (typically O(n^2 log n) or O(n^3)).
Once a merge or split is made, it cannot be undone, which can lead to suboptimal solutions.
Sensitive to the choice of distance metric and linkage criterion.
Dendrograms can become difficult to interpret for very large numbers of data points.

Implementation in Python (Scikit-learn)

Scikit-learn provides a robust implementation of hierarchical clustering through the

code

AgglomerativeClustering

class. You can specify the linkage method, affinity (distance metric), and the desired number of clusters (if known) or use

code

distance_threshold

to determine clusters based on merge distance.

What is the primary difference between agglomerative and divisive hierarchical clustering?

Agglomerative clustering starts with individual data points and merges them, while divisive clustering starts with a single cluster and splits it.

What is the purpose of a dendrogram in hierarchical clustering?

A dendrogram visualizes the hierarchy of clusters, showing how data points are merged at different distance levels, and helps in selecting the number of clusters.

Learning Resources

AgglomerativeClustering - Scikit-learn Documentation(documentation)

The official documentation for Scikit-learn's implementation of Agglomerative Clustering, detailing parameters and usage.

Hierarchical Clustering - Scikit-learn User Guide(documentation)

An overview of clustering techniques in Scikit-learn, with a dedicated section on hierarchical clustering and its concepts.

Hierarchical Clustering Explained(tutorial)

A comprehensive tutorial on hierarchical clustering in Python, covering the algorithm, dendrograms, and practical implementation.

Understanding Hierarchical Clustering(blog)

A blog post that breaks down the concepts of hierarchical clustering, including different linkage methods and dendrogram interpretation.

Hierarchical Clustering - Machine Learning Mastery(blog)

A detailed explanation of hierarchical clustering, its algorithms, and how to implement it with Python and Scikit-learn.

Clustering - Hierarchical Clustering (Video)(video)

A clear video explanation of hierarchical clustering, including how dendrograms are formed and interpreted.

Hierarchical Clustering - Wikipedia(wikipedia)

The Wikipedia page provides a broad overview of hierarchical clustering, its variations, and applications.

Introduction to Hierarchical Clustering(blog)

GeeksforGeeks offers an introductory article to hierarchical clustering, explaining its types and working.

Clustering Algorithms: Hierarchical Clustering(blog)

This article covers the introduction, algorithms, pros, and cons of hierarchical clustering with practical examples.

Applied Machine Learning: Hierarchical Clustering(video)

A video tutorial demonstrating the application of hierarchical clustering in Python, focusing on practical implementation and interpretation.