kermitt2/delft

LMDB embeddings creation is very slow on spinning drive

Closed this issue · 2 comments

It is very slow on a spinning drive (40 it/s) compared to an SSD (4000 it/s). It is caused by frequent commits (and LMDB not being write optimized).
For the moment I can't find any evidence supporting the fact LMDB will or will not exclusively use RAM when writing all data in the same transaction. It does not seem likely though. There are kernel parameters controlling how dirty pages of an mmap are flushed on disk (http://jmoiron.net/blog/mmap2/). As far as I understand, those parameters will have an impact on actual RAM consumption.

Thank you @bfreuden !
Indeed without the frequent commits, it's considerably faster, even with SSD it's going for me from 2600 it/s up to 9000 it/s. I observed that without commits, more RAM is used as we could expect (around 1GB is used).

PR #55 has been merged