Welcome to the LIPN Big Data Clustering Library gathering algorithms and quality indexes.
You will find additional contents about clustering algorithms here.
Don't hesitate to ask questions or recommendations in our Gitter.
Basic usages of implemented algorithms are exposed with SparkNotebooks in Spark-Clustering-Notebook organization.
Add following lines in your build.sbt:
"clustering4ever" % "clustering4ever_2.11" % "0.2.3"
to yourlibraryDependencies
resolvers += Resolver.bintrayRepo("clustering4ever", "Clustering4Ever")
You can also take specifics parts :
- K-Means
- Implementation allowing the choice of the dissimilarity measure.
- Complexity O(k.n.t)
- Warning* -> works only with Euclidean distance at the moment
- Self Organizing Maps
- Mean Shift
- Complexity
- Initial complexity O(n2)
- Improved complexity O(n) under some conditions
- Complexity
- K-Modes
- Complexity O(k.n.t)
- Implementation allowing the choice of the dissimilarity measure.
- Warning* -> works only with Hamming distance at the moment
- Self Organizing Maps
- Mixed topological Map
*
We deliberately choose to not implement other distances than Hamming and Euclidean for Spark version of K-Modes and K-Means for reason explain in their Scala cousins versions.
- Gradient ascent
- Feature selection
A good scala clustering complementary library aka Smile
- Jenks Natural Breaks
- A mono dimensionnal clustering
- K-Means
- Complexity O(k.n.t)
- Implementation allowing the choice of the dissimilarity measure.
- Warning -> with another distance than Euclidean, similarity matrix in O(n2) of each cluster is computed to find the best prototype, depending on cluster size it can becomes way slower than Euclidean
- K-Modes
- Complexity O(k.n.t)
- Implementation allowing the choice of the dissimilarity measure.
- Warning -> with another distance than Hamming, similarity matrix in O(n2) of each cluster is computed to find the best prototype, depending on cluster size it can becomes way slower than Hamming
- Mutual Information (scala & spark)
- Normalized Mutual Information (scala & spark)
- Davies Bouldin (scala)
- Silhouette (scala)