/mebla

An elasticsearch wrapper for mongoid odm based on slingshot

Primary LanguageRubyMIT LicenseMIT

Mebla

Mebla is an elasticsearch wrapper for Mongoid based on Slingshot.

Name

Mebla is derived from the word "Nebla", which means slingshot in arabic.

Also since its a wrapper for mongoid ODM, the letter "N" is replaced with "M".

Installation

Install elasticsearch

Mebla requires a running elasticsearch installation.

To install elasticsearch follow the uptodate instructions here or simply copy and paste in your terminal window:

$ curl -k -L -o elasticsearch-0.15.0.tar.gz http://github.com/downloads/elasticsearch/elasticsearch/elasticsearch-0.15.0.tar.gz
$ tar -zxvf elasticsearch-0.15.0.tar.gz
$ ./elasticsearch-0.15.0/bin/elasticsearch -f

Install Mebla

Once elasticsearch is installed, add Mebla to your gem file:

gem "mebla"

then run bundle in your application root to update your gems' bundle:

$ bundle install

next generate the configuration file:

$ rails generate mebla:install

finally index your data:

$ rake mebla:index_data

Usage

Defining indexed fields

To enable searching models, you first have to define which fields mebla should index:

class Post
  include Mongoid::Document
  include Mongoid::Mebla
  field :title
  field :author
  field :body
  field :publish_date, :type => Date
  field :tags, :type => Array
  
  embeds_many :comments
  search_in :author, :body, :publish_date, :tags, :title => { :boost => 2.0, :analyzer => 'snowball' }
end

In the example above, mebla will index the author field, body field, publish_date field and finally indexes the title field with some custom mappings.

Embedded documents

You can also index embedded documents as follows:

class Comment
  include Mongoid::Document
  include Mongoid::Mebla
  field :comment
  field :author
  
  embedded_in :blog_post
  search_in :comment, :author, :embedded_in => :blog_post
end

This will index all comments and make it available for searching directly through the Comment model.

Indexing methods

You can also index method results:

class Post
  include Mongoid::Document
  include Mongoid::Mebla
  field :title
  field :author
  field :body
  field :publish_date, :type => Date
  field :tags, :type => Array
  
  embeds_many :comments
  search_in :author, :body, :publish_date, :tags, :permalink, :title => { :boost => 2.0, :analyzer => 'snowball' }
  
  def permalink
    self.title.gsub(/\s/, "-").downcase
  end
end

This will index the result of the method permalink.

Indexing fields of relations

You can also index fields of relations:

class Post
  include Mongoid::Document
  include Mongoid::Mebla
  field :title
  field :author
  field :body
  field :publish_date, :type => Date
  field :tags, :type => Array
  
  embeds_many :comments
  search_in :author, :body, :publish_date, :tags, :title => { :boost => 2.0, :analyzer => 'snowball' },
    :search_relations => {:comments => :author}
end

This will index authors of all comments embedded with this Post.

Searching the index

Mebla supports two types of search, index search and model search; in index search Mebla searches the index and returns all matching documents regardless of their types, in model search however Mebla searches the index and returns matching documents of the model(s) type(s).

Index searching

Using the same models we defined above, we can search for all posts and comments with the author "cousine":

Mebla.search "author: cousine"

This will return all documents with an author set to "cousine" regardless of their type, if we however want to search only Posts and Comments, we would explicitly tell Mebla:

Mebla.search "author: cousine", [:post, :comment]

Model searching

Instead of searching all models like index searching, we can search one model only:

Post.search("title: Testing Search").desc(:publish_date).only(
  :author => ["cousine"], 
  :tags => ["ruby", "rails"]
).facet('tags', :tags, :global => true).facet('authors', :author)

In the above example we are taking full advantage of slingshot's searching capabilities, we are getting all posts with the title "Testing Search", filtering the results with author "cousine", tagged "ruby" or "rails", and sorting the results with their publish_date fields.

One more feature we are using is "Faceted Search", from Slingshot's homepage:

Faceted Search

ElasticSearch makes it trivial to retrieve complex aggregated data from the index/database, so called facets.

In the example above we are retrieving two facets, "tags" and "authors"; "tags" are global which means that we want to get the counts of posts for each tag over the whole index, "authors" however will only get the count of posts matching the search query for each author.

Retrieving results

To retrieve the results of the model search we performed above we would simply:

hits = Post.search("title: Testing Search").desc(:publish_date).only(
  :author => ["cousine"], 
  :tags => ["ruby", "rails"]
).facet('tags', :tags, :global => true).facet('authors', :author)

hits.each do |hit|
  puts hit.title
end

To retrieve the facets:

# Get the count of posts for each tag accross the index
hits.facets['tags']['terms'].each do |facet|
  puts "#{facet['term']} : #{facet['count']}"
end

# Get the count of posts matching the query for each author
hits.facets['authors']['terms'].each do |facet|
  puts "#{facet['term']} : #{facet['count']}"
end

Indexing data

Synchronizing data

By default Mebla synchronizes all changes done to your models with your index, if however you would like to bypass this behavior:

Post.without_indexing do
  Post.create :title => "This won't be indexed"
end

Indexing existing data

You can index existing data by using the "index" rake task:

$ rake mebla:index

This will create the index and index all the data in the database

Reindexing

Just like indexing, you can reindex your data using the "reindex" rake task:

$ rake mebla:reindex

This will rebuild the index and index all your data again, note that unlike other full-text search engines, you don't need to reindex your data frequently (if ever) however you might want to refresh the index so changes are reflected on the index.

Refreshing the index

Refreshing the index makes changes done to the index available for searching or modification.

Mebla automatically refreshes the index whenever a change is done, but just incase you need to refresh the index:

$ rake mebla:refresh

Rake tasks

Mebla provides a number of rake tasks to perform various tasks on the index, you can list all tasks using this command:

$ rake -T mebla

Contributing to Mebla

  • Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet
  • Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it
  • Fork the project
  • Start a feature/bugfix branch
  • Commit and push until you are happy with your contribution
  • Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
  • Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.

Copyright

Copyright (c) 2011 Omar Mekky. See LICENSE.txt for further details.