This repository contains implementations of K-Means Clustering and Hierarchical Clustering.
The current hierarchical clustering algorithms uses agglomeration to create the clusters and uses the following linkages:
- Single Link (MIN)
- Complete Link (MAX)
- Group Average (AVG)
The agglomerative hierarchical clustering script is divided into two classes:
- Agglomerative_Hierarchical
- Proximity_Matrix
These are further divided as:
- matrix_min(): Returns the current minimum value in the passed matrix.
- min_cluster_distance(): Returns the minimum distance between clusters.
- max_cluster_distance(): Returns the maximum distance between clusters.
- avg_cluster_distance(): Returns the average distance between clusters.
- matrix_gen(): Generates a new proximity matrix after cluster formation.
- clustering(): Clusters points agglomeratively and returns the linkage matrix.
- distance(): Calculates distance between points.
- raw_matrix(): Generates the proximity matrix for the first time from data.
The algorithms were run on a dataset consisting of amino acid sequences. The results are published as dendrograms:
K-Means Clustering
Hierarchical clustering:
- Single Link
- Complete Link
- Group Average
- Numpy
- Scipy
- Matplotlib