Control your Elasticsearch document updates with extra speed.
-
So, Elasticsearch provides update API's like _update and _bulk which helps us in updating elasticsearch document. But elasticsearch updates the document by merging the given [NEW] document with the [OLD] document, which leads to a case where
- The older documents fields which are not present in the new given document will be retained, so if you want delete a field you have to explicitly send a update query with a script. And everyone knows that elasticsearch scripts are slower (eg: to delete a field from 13 million documents it will take 50 min in 5 node cluster)
eg : Old Document
{ "a": 23, "b": 40 } _New Document_ : /_update api { "a": 19, "c": 45 } Now If I send this to elasticsearch the result output document will be : { "a": 19, "b": 40, "c": 45 } So the field "b"is retained, but my use case is to remove the "b", as I am not sending it in the request.
- The Advance Plugin can do this for you.
eg : Old Document
{ "a": 23, "b": 40 } _New Document_ : /_advanceupdate api { "a": 19, "c": 45 } Now If I send this to elasticsearch the result output document will be : { "a": 19, "c": 45 }
- Elasticsearch Updates first update documents in the primary shard and then it updates the replica shards, now that increased the total time required for update of document.
- You can reduce the total time by setting the replica to -1, but if you have a lot of documents which takes 1 hour to get updated and in meanwhile your nodes goes down you dont have a protection of the replicas to recover i.e, you loose your data.
- Advance-Update gives you the functionality where your old data is safe and speed for updating documents is much higher than the normal updates.
-
POST _advanceupdate
usage :
/_update { "a": 19, "c": 45 }
-
POST/PUT _advancebulk
/_advancebulk { "update" : {"_id" : "2", "_type" : "type1", "_index" : "test"} } { "doc" : {"b": 12,"f": 14,"m":15}, "doc_as_upsert" : true} { "update" : {"_id" : "2", "_type" : "type1", "_index" : "test"} } { "doc" : {"s": 15,"l": 14,"k":12}, "doc_as_upsert" : true}
- Elasticsearch 5.6.0
- Change the thread_pool.bulk.queue_size to high enough so that your documents don't get skipped or else you can keep it to -1 in your elasticsearch.yml file.
Download and install elasticsearch 5.6.0 from
Download the latest Advance Update plugin from
Install the plugin :
<ES directory>/elasticsearch-plugin install file:///Downloads/advance-update.zip
Test the installation by visiting https://localhost:9200/_cat/plugins
Works with only elasticsearch 5.6.0
- Configure copying elasticsearch documents from primary shard to replicas
- Updating a key in all the documents.
- Viraj Parab - Initial work - CrazyCompiler