Implements the Agglomerative Hierarchical Clustering algorithm.
To run the clustering program, you need to supply the following parameters on the command line:
-
Input file that contains the items to be clustered.
-
Number of disjointed clusters that we wish to extract.
-
Linkage criteria to use when calculating the distance metric.
s
- Single linkage (default)c
- Complete linkagea
- Average linkaget
- Centroid linkage
For instance, the following is an example run:
$ ./agglomerate example.txt 3 s
In this example, we are running the hierarchical agglomerative
clustering on the items in the input file example.txt
. We are asking
the program to generate 3
disjointed clusters using the
single-linkage distance metric.
The input file contains the items to be clustered.
<number of items to cluster>
<label string>| <x-axis value> <y-axis value>
...
For instance, the following is a valid input. It contains 12 data points, where each data point is referred to by its label and has coordinates in the two-dimensional Euclidean plane.
12
A| 1.0 1.0
B| 2.0 1.0
C| 2.0 2.0
D| 4.0 5.0
E| 5.0 4.0
F| 5.0 5.0
G| 5.0 6.0
H| 6.0 5.0
I| 9.0 9.0
J| 10.0 9.0
K| 10.0 10.0
L| 11.0 9.0
After running the clustering algorithm, we get the following hierarchy:
The cluster hierarchy may be represented by the binary tree:
For further details, please visit my [homepage](http://yaikhom.com/2014/08/21/ agglomerative-hierarchical-clustering.html).