Ruby client for Carrot2 - the open-source document clustering server
First, download and run the Carrot2 server. With Homebrew, use:
brew install carrot2
brew services start carrot2
Then add this line to your application’s Gemfile:
gem "carrot2"
The latest version works with Carrot2 4. For Carrot2 3, use version 0.2.1 and this readme.
To cluster documents, use:
documents = [
"Sign up for an exclusive coupon.",
"Exclusive members get a free coupon.",
"Coupons are going fast.",
"This is completely unrelated to the other documents."
]
carrot2 = Carrot2::Client.new
carrot2.cluster(documents)
This returns:
{
"clusters" => [
{
"labels" => ["Coupon"],
"documents" => [0, 1, 2],
"clusters" => [],
"score" => 0.06418006702675011
},
{
"labels" => ["Exclusive"],
"documents" => [0, 1],
"clusters" => [],
"score" => 0.7040290701763807
}
]
}
Documents are numbered in the order provided, starting with 0.
Specify a language with:
carrot2.cluster(documents, language: "French")
Specify an algorithm with:
carrot2.cluster(documents, algorithm: "Lingo")
Get a list of supported languages and algorithms with:
carrot2.list
Specify parameters with:
parameters = {
preprocessing: {
phraseDfThreshold: 1,
wordDfThreshold: 1
}
}
carrot2.cluster(documents, parameters: parameters)
See supported parameters for Lingo, STC, and Bisecting K-Means.
Specify a template with:
carrot2.cluster(documents, template: "lingo")
To specify the Carrot2 server, set ENV["CARROT2_URL"]
or use:
Carrot2::Client.new(url: "http://localhost:8080")
Set timeouts
Carrot2::Client.new(open_timeout: 3, read_timeout: 5)
View the changelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/carrot2-ruby.git
cd carrot2-ruby
bundle install
bundle exec rake test