EULindexer is currently experimental code going through development and preliminary production deployment. It is not recommended to use this code base until our internal testing on this code has been completed and this message has been removed.
EULindexer manages and coordinates index updates content that is indexed by multiple sites and in multiple Solr instances, normally triggered by an object update; this is done by way of a very simple JSON service (format documented below) for passing configuration and indexing data. In the current implementation, EULfedora provides a webservice that can be used for indexing, and Solr is used for indexing, but EULindexer can be used with any site that provides JSON data in the required format.
:mod:`eulindexer` currently consists of one primary script, :mod:`~eulindexer.indexer.management.commands.indexer`, a helper script :mod:`~eulindexer.indexer.management.commands.reindex`, and a minimal web application that exposes Django's :mod:`django.contrib.admin` interface for reviewing errors logged in the database by either of the indexing scripts.
Any site configured in INDEXER_SITE_URLS in localsettings.py
will be expected to respond to two types of requests, as documented
below.
When the :mod:`~eulindexer.indexer.management.commands.indexer` or
:mod:`~eulindexer.indexer.management.commands.reindex` scripts start
up, they will query the sites configured as INDEXER_SITE_URLS at
the exact url configured in localsettings.py
. That url is
expected to respond with a JSON object that consists of the
SOLR_URL where this site wants content indexed and a list of
CONTENT_MODELS that will be used to identify the types of objects
this site indexes.
The CONTENT_MODELS portion of the response should be a list of lists, where each list is the combination of content models required to identify an object that the site indexes. For example, if a site indexes text content and audio-visual content (but not audio-only or visual-only), it might return something like this:
{ "SOLR_URL": "http://localhost:8983/solr/myapp", "CONTENT_MODELS": [ ["info:fedora/cmodel:text"], ["info:fedora/cmodel:visual", "info:fedora/cmodel:audio"] ] }
When :mod:`~eulindexer.indexer.management.commands.indexer` identifies objects that have the text content model or both the visual and audio content models (and possibly others), it will attempt to index them in this site. Note that the order of the content models is not significant; they simply act as a composite key to identify an object from Fedora.
When either of the indexing scripts identify an object to be indexed
by a particular configured site, they will query the site for the data
that should be sent to the Solr index. Based on the url set for a
particular site in INDEXER_SITE_URLS, the indexer will query for
/pid/
; e.g., if the url configured in localsettings.py
is:
http://localhost/myapp/indexdata/
For an object with pid test:123
, the :mod:`eulindexer` will
request index data at:
http://localhost/myapp/indexdata/test:123/
This url is expected to return a JSON object with index data as field: value, e.g.
{ "pid":"test:123", "title":"Emory University", "description": "A University located in the Southeast." }
Any JSON content in this format (including, e.g., lists for values) can be used, as long as it matches the field names in the configured Solr schema for this site. The JSON content is converted to an equivalent python object and passed to the :mod:`sunburnt` :meth:`sunburnt.sunburnt.SolrInterface.add` method.
Note
:mod:`eulindexer` does not send Solr a commit
message, as that
can be handled much more efficiently and directly by Solr.
If the indexer encounters an error on indexing an individual item, either when requesting the index data or sending that data to Solr, it will do as follows: if the error is potentially recoverable, it will attempt to reindex it (using the configured retry); if retries fail, or the error is not recoverable, it will remove the item from the index queue and log an :class:`~eulindexer.indexer.models.IndexError` in the database for review.
Current versions of EULfedora have support for the :mod:`eulindexer` index-data service, which can be enabled and extended in any project using :mod:`eulfedora` with :mod:`django`. The code and documentation for this functionality can be found in :mod:`eulfedora.indexdata`.
EULindexer currently depends on django, sunburnt, httplib2, stompest, pyPdf
EULindexer could be used without EULfedora, but a compatible web interface would need to be built for any replacement.
eulindexer was created by the Digital Programs and Systems Software Team of Emory University Libraries.
libsysdev-l@listserv.cc.emory.edu
eulindexer is distributed under the Apache 2.0 License.