/grape

Document clustering

Primary LanguageScalaApache License 2.0Apache-2.0

Grape

Grape is a collection of document clustering algorithms written in Scala. It avails from Apache OpenNLP to extract specific feature from each document and build the final vector space that is used in different approaches. Grape contains the following algorithms (at the moment):

  • KMean Clustering
  • Hierarchical Agglomerative Clustering
  • Buckshot Clustering

How to use

An example how to use KMean clustering on your documents:

import com.jayway.textmining.{NLPFeatureSelection, Cluster, KMeanCluster}

// number of clusters
val k = ...

// A document is a pair of (Document ID, Document Content). ID can be anything.
val docs: List[(String, String)] = ...

val kMeanCluster = new KMeanCluster(docs, k) with NLPFeatureSelection
val clusters:List[Cluster] = kMeanCluster.doCluster()

License

Copyright (C) 2012 Amir Moulavi

Distributed under the Apache Software License.