Welcome to Thinking Sphinx version 3 – a complete rewrite from past versions, built for Rails 3.1 or newer only.
It’s a gem, so install it like you would any other gem. You will also need to specify the Mysql2 gem as well (this is not an inbuilt dependency because JRuby, when supported, will need something different):
gem 'mysql2', '0.3.13'
gem 'thinking-sphinx', '3.0.5'
The mysql2 gem is required for connecting to Sphinx, so please include it even when you’re using PostgreSQL for your database.
As for Sphinx itself, you need to make sure it’s compiled with MySQL support, even if you’re using PostgreSQL as your database – which is normally the default, but Homebrew can be too smart sometimes. If that’s your compiling tool of choice, make sure you install with the --mysql
flag:
brew install sphinx --mysql
Indexes are no longer defined in models – they now live in `app/indices` (which you will need to create yourself). Each index should get its own file, and look something like this:
# app/indices/article_index.rb
ThinkingSphinx::Index.define :article, :with => :active_record do
indexes title, content
indexes user.name, :as => :user
indexes user.articles.title, :as => :related_titles
has published
end
You’ll notice the first argument is the model name downcased and as a symbol, and we are specifying the processor – :active_record
. Everything inside the block is just like previous versions of Thinking Sphinx. Same goes for most settings in config/thinking_sphinx.yml
(formerly config/sphinx.yml
) unless noted below.
When you’re defining indices for namespaced models, use a lowercase string with /’s for namespacing as the model reference:
# For a model named Blog::Article:
ThinkingSphinx::Index.define 'blog/article', :with => :active_record
Other changes:
- SphinxQL is now used instead of the old socket connections (hence the dependency on the
mysql2
gem). - Specifying a different port for Sphinx to use (in
config/thinking_sphinx.yml
) should be done with the mysql41 setting, not the port setting. - The searchd_log and searchd_query_log settings are now log and query_log (matching their Sphinx names).
- searchd_file_path is now indices_location.
- config_file is now configuration_file.
- You’ll need to include
ThinkingSphinx::Scopes
into your models if you want to use Sphinx scopes. Default scopes can be set as follows:
class Person < ActiveRecord::Base
include ThinkingSphinx::Scopes
sphinx_scope(:date_order) { {:order => :created_at} }
default_sphinx_scope :date_order
# ...
end
- The match mode is always extended – SphinxQL doesn’t know any other way.
- ActiveRecord::Base.set_sphinx_primary_key is now an option in the index definition (alongside
:with
in the above example)::primary_key
– and therefore is no longer inheritable between models. - If you’re explicitly setting a time attribute’s type, instead of
:datetime
it should now be:timestamp
. - Delta arguments are passed in as an option of the
define
call, not within the block:
ThinkingSphinx::Index.define :article, :with => :active_record, :delta => true do
# ...
end
- Suspended deltas are no longer called from the model, but like so instead:
ThinkingSphinx::Deltas.suspend :article do
article.update_attributes(:title => 'pancakes')
end
- Excerpts through search results behaves the same way, provided you add ExcerptsPane into the mix (read the section below on search results, glazes and panes). Excerpt options (like
:before_match
,:after_match
and:chunk_separator
) can be passed through when searching under the:excerpts
option:
ThinkingSphinx.search 'foo',
:excerpts => {:chunk_separator => ' -- '}
- When indexing models on classes that are using single-table inheritance (STI), make sure you have a database index on the
type
column. Thinking Sphinx will need to determine which subclasses are available, and we can’t rely on Rails having loaded all models at any given point, so it queries the database. If you don’t want this to happen, set :skip_sti to true in your search call, and ensure that the :classes option holds all classes that could be returned.
ThinkingSphinx.search 'pancakes',
:skip_sti => true,
:classes => [User, AdminUser, SupportUser]
- The option
:rank_mode
has now become:ranker
– and the options (as strings or symbols) are as follows: proximity_bm25, bm25, none, wordcount, proximity, matchany, fieldmask, sph04 and expr. - There are no explicit sorting modes – all sorting must be on attributes followed by ASC or DESC. For example:
:order => '@weight DESC, created_at ASC'
. - If you specify just an attribute name as a symbol for the
:order
option, it will be given the ascending direction by default. So,:order => :created_at
is equivalent to:order => 'created_at ASC'
. - If you want to use a calculated expression for sorting, you must specify the expression as a new attribute, then use that attribute in your
:order
option. This is done using the:select
option to specify extra columns available in the underlying SphinxQL (not ActiveRecord/SQL) query.
ThinkingSphinx.search(
:select => '@weight * 10 + document_boost as custom_weight',
:order => :custom_weight
)
- Support for latitude and longitude attributes named something other than ‘lat’ and ‘lng’ or ‘latitude’ and ‘longitude’ has been removed. May add it back in if requested, but would be surprised if it’s a necessary feature.
- Set INDEX_ONLY to true in your shell for the index task to re-index without regenerating the configuration file.
- If you want to pass the old-style
:include
,:joins
,:select
or:order
parameters through to the underlying ActiveRecord SQL queries for instantiating search results, they should go in a hash within the search option:sql
:
Article.search :sql => {:include => :user}
- SphinxQL only supports grouping by single attributes – but these attributes may be generated on the fly within the select statement (see the
:select
option above). A grouped search uses the:group_by
option, and you can pass in the attribute name as either a symbol or a string:
Article.search :group_by => :user_id
- If you want to change the order of which result appears for each group, that can be done via the
:order_group_by
option – which behaves just like:order
does:
Article.search(
:group_by => :user_id,
:order_group_by => 'created_at DESC'
)
- The
each_with_group
,each_with_count
andeach_with_group_and_count
enumerators are available when using the:group_by
option (but are otherwise not available to search objects). Please note the spelling – older versions of Thinking Sphinx allowed for groupby and group, this is no longer the case. each_with_weight
(again, note that it’s weight, not weighting) is available, but not by default. Here’s an example of how to have it part of the search object:
search = Article.search('pancakes', :select => '*, @weight')
search.masks << ThinkingSphinx::Masks::WeightEnumeratorMask
search.each_with_weight do |article, weight|
# ...
end
You’ll also note here that I’m specifying the internal weight attribute. This is necessary for edge Sphinx post 2.0.5.
- Batched/Bulk searches are done pretty similarly as in the past – here’s a code sample that’ll only hit Sphinx once:
batch = ThinkingSphinx::BatchedSearch.new
batch.searches << Article.search('foo')
batch.searches << Article.search(:conditions => {:name => 'bar'})
batch.searches << Article.search_for_ids('baz')
# When you call batch#populate, the searches are all populated with a single
# Sphinx call.
batch.populate
batch.searches #=> [[foo results], [bar results], [baz results]]
- To search on specific indices, use the
:indices
option, which expects an array of index names (including the_core
or_delta
suffixes). :without_any
has become:without_all
– and is implemented, but Sphinx doesn’t yet support the required logic.- If you’re creating a multi-value attribute manually (using a SQL snippet), then in the definition pass in
:multi => true
, but:type
should be set as well, to one of the MVA types that Sphinx supports (:integer
,:timestamp
, or:boolean
). - Automatic updates of non-string attributes are still limited to those from columns on the model in question, and is disabled by default. To enable it, just set attribute_updates to true in your
config/thinking_sphinx.yml
. - Search result helper methods are no longer injected into the actual result objects. Read the section below on search results, glazes and panes.
- If you’re using string facets, make sure they’re defined as fields, not strings. There is currently no support for multi-value string facets.
- To have fine-grained control over when deltas are invoked, create a sub-class of your chosen delta class (the standard is
ThinkingSphinx::Deltas::DefaultDelta
) and customise thetoggle
andtoggled?
methods, both of which accept a single parameter being the ActiveRecord instance.
class OccasionalDeltas < ThinkingSphinx::Deltas::DefaultDelta
# Invoked via a before_save callback. The default behaviour is to set the
# delta column to true.
def toggle(instance)
super unless instance.title_changed?
end
# Invoked via an after_commit callback. The default behaviour is to check
# whether the delta column is set to true. If this method returns true, the
# indexer is fired.
def toggled?(instance)
return false unless instance.title_changed?
super
end
end
# And in your index definition:
ThinkingSphinx::Index.define :article, :with => :active_record, :delta => OccasionalDeltas do
# ...
end
- Polymorphic associations used within index definitions must be declared with the corresponding models. This is much better than the old approach of querying the database on the *_type column to determine what models to join against.
indexes events.eventable.name
polymorphs events.eventable, :to => %w(Page Post User)
This section needs information – go hunting in the source for the moment if you’re keen on adding a layer around querying/result population process.
In versions of Thinking Sphinx prior to v3, each search result object had many methods inserted into it – for direct access to the weight, distance, sphinx attributes and excerpts. This is no longer the case, but there is a more modular approach available.
Search results may now have a glaze object placed around them, which can then delegate methods to any number of panes the glaze has available. By default, there are no panes added (and thus, no glazing), but this can be modified:
# For every search
ThinkingSphinx::Configuration::Defaults::PANES << ThinkingSphinx::Panes::WeightPane
# Or for specific searches:
search = ThinkingSphinx.search('pancakes')
search.context[:panes] << ThinkingSphinx::Panes::WeightPane
The available panes are as follows:
WeightPane
(methods:weight
)DistancePane
(methods:distance
,geodist
)AttributesPane
(methods:sphinx_attributes
)ExcerptsPane
(methods:excerpts
)
All panes namespaced to ThinkingSphinx::Panes
, and the DistancePane
is automatically added when you provide latitude/longitude values via the :geo
option.
If you wish to add your own panes, go ahead. The only requirement is that the initializer must accept three arguments: the search context, the underlying search result object, and a hash of the raw values from Sphinx.
Thinking Sphinx comes with several Capistrano tasks to help ease deployment of your applications. Just require the recipes:
# config/deploy.rb
require 'thinking_sphinx/capistrano'
When running cap deploy:setup
to get your server ready for deployments, Thinking Sphinx will also set up a shared folder for your indexes. Before finalizing a deployment, these indexes will be symlinked into the release path for use. When you deploy your application for the first time using cap deploy:cold
, your indexes will be built for you and the search daemon will be started. Run cap -T
to see all of the deployment tasks.
Almost all functionality from v2 releases are implemented, though it’s worth noting that some settings haven’t yet been brought across, and a handful of the smaller features don’t yet exist either. Some may actually not return… we’ll see.
May or may not be added:
- Datetime Deltas
- Bitmask weighting helper
- Timezone support (for databases not using UTC)
- Abstract Inheritance support (maybe – not sure this is something many of people want).
- Facet support for arrays of strings.
TS 3 is built for Sphinx 2.0.5 or newer. You cannot use 1.10-beta, 0.9.9 or anything earlier than that.
Currently TS 3 is built to support Rails/ActiveRecord 3.1 or newer. If you’re using Sinatra and ActiveRecord instead of Rails, that’s fine – just make sure you add the :require => 'thinking_sphinx/sinatra'
option when listing thinking-sphinx
in your Gemfile.
TS 3 does not support Rails 3.0, Rails 2.x or earlier, or Merb – please refer to the TS 1.x and 2.x releases in those situations.
Built on MRI 1.9.3 and tested against MRI 1.9.2 as well. No plans to support MRI 1.8, but would like to support Rubinius and JRuby (the one catch with the latter is the different MySQL interfaces).
There’s also the complication that Sphinx 2.0.x releases don’t work with JDBC (as JDBC sends several MySQL-specific commands through when initializing a connection). So, JRuby support won’t appear until there’s a stable Sphinx release that can interact with JDBC.
MySQL 5.x and Postgres 8.4 or better are supported.
You’re brave! To contribute, clone this repository and have a good look through the specs – you’ll notice the distinction between acceptance tests that actually use Sphinx and go through the full stack, and unit tests (everything else) which use liberal test doubles to ensure they’re only testing the behaviour of the class in question. I’ve found this leads to far better code design.
If you’re still interested in helping evolve this, then write the tests and then the code to get them passing, and send through a pull request. No promises on merging anything, but we shall see!
For some ideas behind my current approach, have a look through sketchpad.rb
in the root of this project. If you can make sense of that, you’re doing very well indeed.
Copyright © 2007-2012, Thinking Sphinx is developed and maintained by Pat Allan, and is released under the open MIT Licence. Many thanks to all who have contributed patches.