swisscom/backman

Elasticsearch backup performance improvement

Opened this issue · 7 comments

Currently using backman v1.30.2 backing up data from our elasticsearch service in internal iAPC.

Backman is taking almost 30min to backup 23MB, now is running since almost 5 hours to backup 600MB and is not yet done.

That's not possible, we have elasticsearch with 4-5GB of data still growing, how long will take to backup those? days?

I've created searchdump (Swisscom-internal only for now, open source soon) which is basically solving the performance issue of elasticsearch-dump and is implemented in Go. Therefore the integration with backman should be pretty easy. Me and @JamesClonk already chatted quickly about it already :)

that's very good news!
Any plan when you'll integrate in backman? So we can track it an make a try when is ready?

I'll work on some backman features today, but sadly I haven't planned some time to make searchdump open source or integrate it w/ backman yet

FYI: searchdump is now public :)

Hello @denysvitali, we are very interested in the feature, since we have big ES instances that we cannot backup with Backman since a couple of years. Any news regarding its integration to Backman? Thanks

Hello,

@JamesClonk, @denysvitali, We are also very interested.
I'm not sure if the integration of searchdump is still on the table, but if not, perhaps we could consider adding two new parameters to elasticdump: 'limit' and 'searchBody'. This way, we could reasonably increase the limit and use the 'searchBody' parameter to filter the documents we want to back up.

For example, to back up the documents from the past month, we use the command:
elasticdump [...] --searchBody='{"query":{"range":{"@timestamp":{"gte":"now-1M/M","lte":"now/M"}}}}' --limit 200

Hello,

@JamesClonk, @denysvitali, We are also very interested. I'm not sure if the integration of searchdump is still on the table, but if not, perhaps we could consider adding two new parameters to elasticdump: 'limit' and 'searchBody'. This way, we could reasonably increase the limit and use the 'searchBody' parameter to filter the documents we want to back up.

For example, to back up the documents from the past month, we use the command: elasticdump [...] --searchBody='{"query":{"range":{"@timestamp":{"gte":"now-1M/M","lte":"now/M"}}}}' --limit 200

I'll respond to myself and to others that might have similar issues :).
I checked the code of backman and noticed the parameter backup_options that I totally missed...
You can actually use it to provide additional parameters to elasticdump like the ones I mentioned above.