codingchili/excelastic

Out of heap space - parse one bucket at a time

codingchili opened this issue · 2 comments

Current implementation parses all bulk insert buckets into a massive json object which is stored on the heap. Proposed solution prepares a bucket at a time, preferably while elasticsearch is busy indexing.

To work around this issue run with the -Xmx1g parameter, or increase if required.

By parsing the excel files up front it is possible to fail before the import has started.

Done - we still cannot support excel files of arbitrary sizes. Apache POI consumes a LOT of memory.