Grape is a collection of document clustering algorithms written in Scala. It avails from Apache OpenNLP to extract specific feature from each document and build the final vector space that is used in different approaches. Grape contains the following algorithms (at the moment):
- KMean Clustering
- Hierarchical Agglomerative Clustering
- Buckshot Clustering
An example how to use KMean clustering on your documents:
import com.jayway.textmining.{NLPFeatureSelection, Cluster, KMeanCluster}
// number of clusters
val k = ...
// A document is a pair of (Document ID, Document Content). ID can be anything.
val docs: List[(String, String)] = ...
val kMeanCluster = new KMeanCluster(docs, k) with NLPFeatureSelection
val clusters:List[Cluster] = kMeanCluster.doCluster()
Copyright (C) 2012 Amir Moulavi
Distributed under the Apache Software License.