- Batch upload CSV (actually any *SV) files to Elasticsearch
- Batch upload JSON files / JSON lines to Elasticsearch
- Batch upload parquet files to Elasticsearch
- Pre defining custom mappings
- Delete index before upload
- Index documents with _id from the document itself
- Load data directly from url
- Supports ES 1.X, 2.X and 5.X
- And more
pip install elasticsearch-loader
In order to add parquet support run pip install elasticsearch-loader[parquet]
(venv)/tmp $ elasticsearch_loader --help
Usage: elasticsearch_loader [OPTIONS] COMMAND [ARGS]...
Options:
--bulk-size INTEGER How many docs to collect before writing to
ElasticSearch
--concurrency INTEGER How much worker threads to start
--es-host TEXT Elasticsearch cluster entry point. eg.
http://localhost:9200
--index TEXT Destination index name [required]
--delete Delete index before import?
--type TEXT Docs type [required]
--id-field TEXT Specify field name that be used as document
id
--index-settings-file FILENAME Specify path to json file containing index
mapping and settings
--help Show this message and exit.
Commands:
csv
json FILES with the format of [{"a": "1"}, {"b":...
parquet
elasticsearch_loader --index incidents --type incident csv file1.csv file2.csv
elasticsearch_loader --index incidents --type incident csv file1.csv file2.csv
elasticsearch_loader --index incidents --type incident json *.json
git log --pretty=format:'{"sha":"%H","author_name":"%aN", "author_email": "%aE","date":"%ad","message":"%f"}' | elasticsearch_loader --type git --index git json --json-lines -
elasticsearch_loader --index incidents --type incident parquet file1.parquet
elasticsearch_loader --index data --type avg_height --id-field country json https://raw.githubusercontent.com/samayo/country-data/master/src/country-avg-male-height.json
generate_data | elasticsearch_loader --index data --type incident csv -
elasticsearch_loader --id-field incident_id --index incidents --type incident csv file1.csv file2.csv
elasticsearch_loader --bulk-size 300 --index incidents --type incident csv file1.csv file2.csv
elasticsearch_loader --concurrency 20 --index incidents --type incident csv file1.csv file2.csv
elasticsearch_loader --index-settings-file samples/mappings.json --index incidents --type incident csv file1.csv file2.csv
Tests are located under test and can run by runnig tox
input format can be found under samples
- parquet support
- progress bar
- DLQ style out file for docs that didn't got in
- Python3 support
- pep8 test