es-tool
An Elasticsearch tool written in Python.
es-tool.py
utilises the elasticsearch-py client, that in turn
interacts with Elasticsearch via it's API.
Official low-level client for Elasticsearch. Its goal is to provide common ground for all Elasticsearch-related code in Python; because of this it tries to be opinion-free and very extendable.
Table of Contents
Reason
After moving over to AWS Elasticsearch service, I realised the amount of active shards was increasing by stupid amounts per day. After some reading I had found that by default ES assigns 5 primary shards
and 1 replica shard
meaning each indices was creating 10 shards.
AWS ES service doesn't allow for you to specify the number of shards via elasticsearch.yaml
or a GET /_cluster/settings
method, it can only be done via index templates. This then means as opposed to specifying your number of shards per cluster, it's done per index.
Rather annoyingly, at the time of writing this the AWS ES version is 1.5.2
which means I couldn't use the Reindex API, or use logstash as there is no amazon_es input only output.
I was initially messing around with the API, and thought it'd be nice to create a tool to make my life a little easier when doing a few administration tasks (like reindexing).
Installation
To use the tool you'll need to install it's dependencies.
pip install -t vendored/ -r requirements.txt
Currently I've only tested it with Python 2.X
Usage
usage: es-tool.py [-h] [-r REINDEX] [-n NEW_INDEX_NAME] [-d DELETE_INDEX]
[-S SOURCE] [-e ENDPOINT] [-D DESTINATION] [-ps PORT_SOURCE]
[-pd PORT_DESTINATION] [-ls SSL_SOURCE]
[-ld SSL_DESTINATION]
Elasticsearch management
optional arguments:
-h, --help show this help message and exit
-r REINDEX, --reindex REINDEX
Reindex all documents in specified index and append
with "-reindex", if --new_index_name options has not
been specified
-n NEW_INDEX_NAME, --new_index_name NEW_INDEX_NAME
Name for new index
-d DELETE_INDEX, --delete_index DELETE_INDEX
Specify which index to delete from source ES
-S SOURCE, --source SOURCE
Specify Elasticsearch host from which the data will be
downloaded
-e ENDPOINT, --endpoint ENDPOINT
Alias for --source for backward compatibility
-D DESTINATION, --destination DESTINATION
Specify Elasticsearch host in which the data will be
uploaded. If not specified, the --source connection
will be used as destination.
-ps PORT_SOURCE, --port_source PORT_SOURCE
Specify port for the source Elasticsearch (9200 by
default, for AWS ES it can be 80 or 443).
-pd PORT_DESTINATION, --port_destination PORT_DESTINATION
Specify port for the destination Elasticsearch (9200
by default, for AWS ES it can be 80 or 443).
-ls SSL_SOURCE, --ssl_source SSL_SOURCE
Use SSL (https) connection for the source
ElasticSearch.
-ld SSL_DESTINATION, --ssl_destination SSL_DESTINATION
Use SSL (https) connection.
Reindex inside one cluster
If you wanted to reindex an index you can do this:
./es-tool.py \
--source elasticsearch-host \
--reindex name-of-index \
--new_index_name name-of-new-index \
--ssl_source true \
--port_source 443
The tool will reindex the indices on the same cluster and append "-reindex" to the end e.g. "name-of-index-reindex"
Reindex from one cluster to another one
./es-tool.py \
--source elasticsearch-host-1 \
--destination elasticsearch-host-2 \
--reindex name-of-index \
--new_index_name name-of-new-index \
--ssl_source true \
--port_source 443 \
--ssl_destination true \
--port_destination 443
To do
- List indexes
- Dry-run
- Improve logging/messages
- Refactor into a module fashion like my hubot-scripts implementation