Hierarchical Clustering and Ward Linkage
Definition
Hierarchical clustering builds nested clusters instead of choosing a single partition immediately.
Agglomerative hierarchical clustering starts with each observation as its own cluster and repeatedly merges clusters.
Distance Matrix
The algorithm begins with a distance matrix:
\[d(i,j)\]which measures how far observation $i$ is from observation $j$.
In the retail mining report, product profiles are first embedded into correspondence-analysis coordinates, then clustered using Euclidean distances in that coordinate space.
Dendrogram
The sequence of merges forms a tree called a dendrogram.
Cutting the dendrogram at a chosen number of clusters $k$ gives a cluster assignment.
Ward Linkage
Ward linkage chooses the merge that creates the smallest increase in within-cluster squared error.
For clusters $A$ and $B$, Ward’s method considers the increase in within-cluster variation after merging them:
\[\Delta(A,B) = WSS(A \cup B) - WSS(A) - WSS(B)\]where:
\[WSS(C) = \sum_{i \in C} ||x_i - \bar{x}_C||^2\]The merge with the smallest $\Delta(A,B)$ is chosen.
Ward.D2
In R, hclust(..., method = "ward.D2") implements the Ward criterion using squared distances in the intended way.
It tends to prefer compact clusters.
Choosing k
After building the dendrogram, candidate values of $k$ can be evaluated using the silhouette coefficient.
A cluster solution can also be rejected if it is structurally degenerate, such as:
- one cluster contains almost all observations
- a cluster has only one or a few observations