Parallelize store action
krichter722 opened this issue · 3 comments
Judging from system monitor metastore -s
only uses one thread. I'm naively assuming that at some point it has to walk down a file and directory tree and visit it's nodes recursively or iteratively. I propose to put file paths in a directory in groups of <= 100 into queues from which n
threads can poll and create the file output which can then be written into a large buffer (in order to avoid an I/O bottleneck). In case it's necessary the output needs to be ordered all threads need a sequence number and others must not proceed until the lowest has finished (all threads have to do nothing, but stat
calls which should cause quite equal load on each thread).
Good idea. It's not something for upcoming v1.1, though, which is mostly meant to constitute what metastore was so far - no behavioral or bigger changes, only simple fixes.
Thanks for the feedback. Can you document nested for
loops in mentries_tofile
, please?
Yes, but only after releasing v1.1.0. There won't be any documentation improvements earlier, unless someone will provide decent pull request with it.