Django-SphinxQL implements Sphinx search for Django, thanks for checking it out.
Django is a Web framework for building websites with relational databases; Sphinx is a search engine designed for relational databases. Django-SphinxQL defines an ORM for using Sphinx in Django. As corollary, it allows you to implement full text search with Sphinx in your Django website.
Specifically, this API allows you to:
- Configure Sphinx with Python.
- Index Django models in Sphinx.
- Execute Sphinx queries (SphinxQL) using Django-like expressions and have the results as Django models.
Django-SphinxQL requires:
- Python 3
- Django (>=1.8)
- Sphinx
- A database backend (pymysql or psycopg2)
Our build matrix in Travis has 8 builds:
- Python 3.4
- Django 1.8, 1.10 and 1.11
- Sphinx 2.2.10 and 2.2.11
- mysql and postgres backends
For more details, you can check the directory tests
and .travis.yml
.
To run the tests locally, use:
PYTHONPATH=..:$PYTHONPATH django-admin.py test --settings=tests.settings_test tests
Also the test settings can be adapted to the local environment using tests.settings_test_local
module. Just set up the INDEXES
and DATABASES
dictionaries accordingly.
The next sections present a minimal setup to use this package. The full documentation is available here.
Django-SphinxQL has no requirements besides Django and Sphinx. To install Sphinx, use:
export VERSION=2.2.10
wget http://sphinxsearch.com/files/sphinx-$VERSION-release.tar.gz
tar -xf sphinx-$VERSION-release.tar.gz
cd sphinx-$VERSION-release
./configure --prefix=$HOME --with-pgsql
make
make install
To install Django-SphinxQL, use:
pip install git+https://github.com/jorgecarleitao/django-sphinxql.git
Django-SphinxQL requires a directory to store its database and be registered as installed app (it doesn't contain Django models):
-
add
sphinxql
to theINSTALLED_APPS
; -
add
INDEXES
to settings:INDEXES = { 'path': os.path.join(BASE_DIR, '_index'), # The directory is created automatically. 'sphinx_path': BASE_DIR, 'sphinx_bin_path': '/opt/sphinx-2.2.11/bin' }
path
is where Sphinx database, i.e. Sphinx indices, are going to be createdsphinx_path
is the directory that will contain Sphinx-specific files such assphinx.conf
,searchd.pid
andsearchd.log
.sphinx_bin_path
is the directory where the sphinx binariessearchd
andindexer
reside. If it is not specified, then the libraries assumes that they are available directly viasearchd
andindexer
commands, i.e. they are inPATH
variable.
-
generate configuration using
python manage.py generate_sphinx_conf
: the configuration file is then used by a local Sphinx instance or you may use parts and put it into your Sphinx server. -
run Sphinx instance either using
python manage.py start_sphinx
or use your own service with the provided config. You may stop the instance usingpython manage.py stop_sphinx
. To connect to another instance useINDEXES = { ... 'connection_params': { 'host': 'my.sphinx.host.com', 'port': 9306 } }
Assume you have a model Document
with a summary
, a text
and a
number
that you want to index. To index it, create a file indexes.py
in
your app with:
from sphinxql import indexes, fields
from myapp import models
class DocumentIndex(indexes.Index):
my_summary = fields.Text(model_attr='summary')
my_text = fields.Text(model_attr='text')
my_number = fields.Integer(model_attr='number')
class Meta:
model = models.Document
model_attr
can be either a string with lookups or an F expression.
E.g. type_name = fields.Text(model_attr='type__name')
will index the name of
the ForeignKey type
of your model, while
type_name = fields.Text(model_attr=Concat('type__name', Value(' '),
'my_text',
output_field=CharField()))
indexes the concatenation of two fields (see also Django documentation). In principle the index fields accept any Django expression Django annotate accepts.
To index your indexes, run:
python manage.py index_sphinx
At this moment you may notice that some files will be created in
settings.INDEXES['path']
: Sphinx database is populated.
Then, start Sphinx daemon (only has to be started once):
python manage.py start_sphinx
(for the sake of reversibility, to stop Sphinx use python manage.py stop_sphinx
)
Django-SphinxQL defines a subclass of Django QuerySet
's, that interfaces with
all Sphinx-related operations. SearchQuerySet
only adds functionality: if you
don't use Sphinx-related, it is a QuerySet
.
Sphinx has a dedicated syntax for text search that Django-SphinxQL also accepts:
>>> q = SearchQuerySet(DocumentIndex).search('@my_text toys for babies')
This particular query returns Documents
restricted to the ones where
"toys for babies" match in field my_text
, ordered by the most relevant match.
Once you perform it, it does the following:
- hit Sphinx database and convert results to
DocumentIndex
instances; - hit Django database to retrieve the respective
Document
instances; - annotate
Document
instances with the respectiveDocumentIndex
instances (in attributesearch_result
) - returns the
Document
instances.
Step 2. is done using .filter(pk__in=[...])
. The results are ordered by relevance
because there was no specific call of order_by
: if you set any ordering
in Django Query, it uses Django ordering (i.e. it overrides the default ordering
but not an explicit ordering). See docs for detailed information.
- Null values are considered empty strings or 0 (constraint on Sphinx engine)
- Only supports dates and times since 1970 (constraint on Sphinx engine)
- Most Sphinx functionality is not implemented, notably real time indexes.
You should check if Django-Haystack suits your needs.
Django-SphinxQL is useful when you can index your data on a time-scale different from "real time". It should be much faster in indexing, and it should have lower memory requirements.