Hierarchical Clustering: Building a Data Hierarchy
Hierarchical clustering is an unsupervised machine learning algorithm that builds a hierarchy of clusters. Unlike k-means, it doesn't require you to pre-specify the number of clusters. Instead, it creates a tree-like structure, often visualized as a dendrogram, showing the relationships between data points at different levels of granularity.
The Core Idea: Merging or Splitting Data Points
Hierarchical clustering creates a nested structure of clusters by either merging data points (agglomerative) or splitting clusters (divisive).
Agglomerative clustering starts with each data point as its own cluster and iteratively merges the closest clusters. Divisive clustering starts with all data points in one cluster and recursively splits them.
There are two main approaches to hierarchical clustering:
-
Agglomerative (Bottom-Up): This is the more common approach. It begins with each data point as a separate cluster. In each step, the two closest clusters are merged until only one cluster remains, encompassing all data points. The 'closeness' between clusters is determined by a linkage criterion.
-
Divisive (Top-Down): This approach starts with all data points in a single cluster. In each step, the cluster that is most dissimilar internally is split into two, until each data point is in its own cluster, or a stopping criterion is met.
Linkage Criteria: Defining Cluster Proximity
The choice of linkage criterion is crucial as it defines how the distance between clusters is calculated. This impacts the shape and structure of the resulting hierarchy.
Linkage Type | Description | Pros | Cons |
---|---|---|---|
Ward | Minimizes the variance of the clusters being merged. | Often produces compact, spherical clusters. | Sensitive to outliers. |
Average | Uses the average distance between all pairs of observations in the two clusters. | Less sensitive to outliers than Ward. | Can be computationally more expensive. |
Complete (Maximum) | Uses the maximum distance between any two observations in the two clusters. | Tends to produce clusters with similar sizes. | Sensitive to outliers and can create elongated clusters. |
Single (Minimum) | Uses the minimum distance between any two observations in the two clusters. | Can create long, chaining clusters. | Very sensitive to noise and outliers. |
The Dendrogram: Visualizing the Hierarchy
A dendrogram is a tree diagram that illustrates the arrangement of the clusters produced by hierarchical clustering. The height of the branches indicates the distance at which clusters were merged. By cutting the dendrogram at a certain height, you can obtain a specific number of clusters.
A dendrogram visually represents the hierarchical clustering process. The horizontal axis lists the data points (or clusters), and the vertical axis represents the distance or dissimilarity at which clusters are merged. Branches connect data points or clusters that are merged together. The height of the merge point on the vertical axis indicates the distance at which the merge occurred. To obtain a specific number of clusters, you can draw a horizontal line across the dendrogram at a chosen height; the number of vertical lines intersected by this horizontal line corresponds to the number of clusters.
Text-based content
Library pages focus on text content
Choosing the Number of Clusters
Unlike k-means, hierarchical clustering doesn't require you to specify 'k' beforehand. However, you still need to decide where to 'cut' the dendrogram to obtain your final clusters. This can be done by:
- Visual Inspection: Looking for a significant jump in the merge distance on the dendrogram.
- Domain Knowledge: Using prior understanding of the data to determine an appropriate number of clusters.
- Metrics: Employing cluster validation metrics like the Silhouette Score or Davies-Bouldin Index, although these are more commonly used with algorithms that require a pre-defined 'k'.
Advantages and Disadvantages
Hierarchical clustering is excellent for exploring data structure and identifying nested relationships, but its computational complexity can be a limitation for very large datasets.
Advantages
- No need to pre-specify the number of clusters.
- Provides a dendrogram, which is useful for understanding data relationships and choosing the number of clusters.
- Can capture clusters of arbitrary shapes (depending on linkage).
- Useful for exploratory data analysis.
Disadvantages
- Computationally expensive, especially for large datasets (typically O(n^2 log n) or O(n^3)).
- Once a merge or split is made, it cannot be undone, which can lead to suboptimal solutions.
- Sensitive to the choice of distance metric and linkage criterion.
- Dendrograms can become difficult to interpret for very large numbers of data points.
Implementation in Python (Scikit-learn)
Scikit-learn provides a robust implementation of hierarchical clustering through the
AgglomerativeClustering
distance_threshold
Agglomerative clustering starts with individual data points and merges them, while divisive clustering starts with a single cluster and splits it.
A dendrogram visualizes the hierarchy of clusters, showing how data points are merged at different distance levels, and helps in selecting the number of clusters.
Learning Resources
The official documentation for Scikit-learn's implementation of Agglomerative Clustering, detailing parameters and usage.
An overview of clustering techniques in Scikit-learn, with a dedicated section on hierarchical clustering and its concepts.
A comprehensive tutorial on hierarchical clustering in Python, covering the algorithm, dendrograms, and practical implementation.
A blog post that breaks down the concepts of hierarchical clustering, including different linkage methods and dendrogram interpretation.
A detailed explanation of hierarchical clustering, its algorithms, and how to implement it with Python and Scikit-learn.
A clear video explanation of hierarchical clustering, including how dendrograms are formed and interpreted.
The Wikipedia page provides a broad overview of hierarchical clustering, its variations, and applications.
GeeksforGeeks offers an introductory article to hierarchical clustering, explaining its types and working.
This article covers the introduction, algorithms, pros, and cons of hierarchical clustering with practical examples.
A video tutorial demonstrating the application of hierarchical clustering in Python, focusing on practical implementation and interpretation.