Extension for the official ElasticSearch python client providing an indices_manager
to create and manage indices with read and write aliases, and perform no-downtime migrations.
pip install slingshot
from weakref import proxy from elasticsearch.client import Elasticsearch from slingshot.indices_manager import IndicesManagerClient es = Elasticsearch() es.indices_manager = IndicesManagerClient(proxy(es))
es.indices_manager.create('slingshot', body={"settings": {"number_of_shards": 1, "number_of_replicas": 1}})
This creates an index with read and write aliases:
- Creates the index "slingshot.{creation_timestamp}"
- Creates a read alias "slingshot"
- Creates a write alias "slingshot.write"
Slingshot manages the read and write aliases for the indices it creates. However, you can upgrade an index that was not created with slingshot. It will simply create a write alias to handle migrations.
es.indices_manager.manage('existing_index')
es.indices_manager.migrate('slingshot', body={"settings": {"number_of_shards": 5, "number_of_replicas": 1}})
This allows to perform changes to an index and migrate documents to take advantage of new mappings:
- creates a new index "slingshot.{modification_timestamp}" with a new configuration (e.g. 5 shards instead of 1)
- swaps write alias to the new index
- scans and bulk imports all documents (optionally ignoring types or performing transformations)
- swaps read alias
- deletes original index (can be skipped)
Note that the index must be created or upgraded with slingshot (by creating a write alias or using the manage
method)
When migrating, it can be useful to transform documents to match a new mapping.
def transform_my_docs(doc): # recompute some fields doc['_source']['discount'] = doc['_source']['price'] / doc['_source']['value'] * 100.0 # drop some fields doc['_source'].pop('useless') # drop documents based on some business rules (assumes the field is first cast to a datetime) if doc['_source]]['expires_at'] < datetime.now(): return None # Don't forget to return the modified document return doc es.indices_manager.migrate('slingshot', body=config_dict_or_string, transform=transform_my_docs)
It can also be useful to ignore some document types altogether.
es.indices_manager.migrate('slingshot', body=config_dict_or_string, ignore_types=["my_type_1", "my_type_2"])
If for any reason you wish to keep the original index (e.g. to rollback in case anything goes wrong) after the migration:
es.indices_manager.migrate('slingshot', body=config_dict_or_string, keep_source=True)
Slingshot is unable to predict what needs to be done with the settings, mappings, aliases, etc. of the new index.
Therefore, when migrating, body must contain all the relevant configuration to create an index from scratch. This can include settings, mappings, aliases, warmers or anything supported by the elasticsearch index API.
However, slingshot manages the migration of the write alias and the read alias (if it exists).
Get a copy of the repository:
git clone git@github.com:OohlaLabs/slingshot.git .
Install tox:
pip install tox
Run the tests:
tox
All contributions and comments are welcome. Simply create a pull request or report a bug.
- Reindex percolators after migrating data
- Allow passing create and copy kwargs to migrate
- Fix compatibility issues with latest versions of elasticsearch-py (<2.0.0)
- Add support for parallel_bulk when migrating/copying
- Reindex percolators when migrating/copying
- Fix six requirement to minimum version instead of exact version
- Initial