mallocator/Elasticsearch-Exporter

Export to file -> data file size is 0

feaster83 opened this issue · 6 comments

Hi,

I try to export a specific index to a file. The export files are created as expected (data and meta file). The problem that I have is that the size of the data file is not constant.

The data file is after most executions empty (0 bytes). After some executions the size is 780kb (seems te be correct). After some executions the size is approx. 2300kb.

The size of the meta file is always equal, what seems te be correct.

I used the command:
node "D:\Program Files\node_modules\elasticsearch-exporter\exporter.js" -a es-server1 -p 9242 -i .kibana -g kibana-test-settings

The output of the exporter script:
Elasticsearch Exporter - Version 1.4.0
Reading source statistics from ElasticSearch
Reading mapping from ElasticSearch
Reading mapping from ElasticSearch
Storing index mapping in meta file kibana-test-settings.meta
Waiting for mapping on target host to be ready, queue length 10
Waiting for mapping on target host to be ready, queue length 20
Mapping is now ready. Starting with 20 queued hits.
Storing index mapping in meta file kibana-test-settings.meta
Mapping is now ready. Starting with 0 queued hits.
Number of calls: 4
Fetched Entries: 24 documents
Processed Entries: 24 documents
Source DB Size: 24 documents

I'm using nodejs version: v0.12.0

Any idea?

Hi Jasper, thanks for using the exporter.

First of all, I haven't tested the exporter on 0.12 yet, so I don't know if that has any impact. I don't expect it to, since the changes seem to not impact the parts of the API I'm using, but you never know.

Secondly, I'm currently doing a rewrite from scratch for version 2.0, which will give you a few more options on how to query data from ElasticSearch. Hopefully I will hunt down all bugs in that rewrite and this issue will be solved.

Now with that out of the way: I don't know what exactly seems to be the problem here. It could be either that ElasticSearch's scroll mechanism doesn't deliver all docs or that the exporter shuts down before all data is written.

What does the output look like when there is data missing? What version of ElasticSearch are you using? How big is you ElasticSearch cluster? Can you try and run a scroll request manually to see if there is data missing?

I have also tested on a 0.10 version of NodeJS. That gives me the same behaviour. I don't expect that it is related to a NodeJS version.

The result of a single execution (most of the time) a data file of 0kb (empty) and a meta(data) file of approx. 2500kb (content of metadata file is correct).

We are running a standalone Elasticsearch 1.4.3 that is almost empty and not heavily used (development environment). We only want to export a single index that contains 24 documents (that is also what the output of the exporter plugin says).

The server where the exporter script is running on is slow. On my faster development machine the results are better. Closing the exporter app too early (or something like that) could be a reason why the file is empty.

Is there a way to get more debug information while running the script (without adding it)?

Nope, that's all the output you get, but that's one of the new features in 2.0 ;)

My guess is that there is a problem because storing the meta data finishes after all documents have already been fetched and the actual export is never triggered. Although you said that there have been cases where only a partial file has been exported?

Sounds logic. Could be the issue.

I have 3 optional results after finishing the script:

Test done on 2 different machines.

meta file -> always size of 2500kb
Optional 1: data file size 0kb (machine 1: +- 80% of times, machine 2: +- 20% of times)
Optional 2: data file size 700kb (machine 1: +- 15% of times, machine 2: +- 75% of times)
Optional 3: data file size 2500kb (machine 1: +- 5% of times, machine 2: +- 5% of times)

I think that the data file of 700kb is a correct one because after making a small change to the index (add a little bit data) the size increased to 780kb. But with that in mind I can't explain why sometimes there is a larger file.

Maybe I can put a sleep in the script to wait before the IO is finished and check the results?

No I'm sorry, but after reviewing the code I just can't see where the timeout would help. Next step would be for me to try and replicate the behavior and then start debugging, but since this code will soon be superseded I'd rather not spend too much time on fixing old stuff (my time is unfortunately already pretty limited as it is).

The data file is nothing more than a zipped json file where the documents are separated by newlines (just like the bulk insert format), so you can take a look at which of the files should be the correct one. If 2500 kb is indeed more than what's in the database, I'd be curious to know what the rest of that data is.

I will check the file contents after the weekend.