Parallelize file parsing
YuriRomanowski opened this issue · 2 comments
YuriRomanowski commented
Clarification and motivation
This topic is a part of #221.
After we read file contents, we should perform parsing of the files, which is (in theory) pure action and can be parallelized.
But we use C library under the hood, so the parallelization may be tricky. Here we can try some approaches and discuss results.
Acceptance criteria
- Some approaches of parallelization are tried
- We decided how to handle foreign calls during parsing for parallelization
- Speedup is obtained and proved with measurements
YuriRomanowski commented
I uploaded some commits where different variations of xrefcheck can be load-tested (in branch YuriRomanowski/#247-parallelize-file-parsing-scaffolding):
- Original version (from
master
) with lazyreadFile
: 2a959d0 - Replace lazy
readFile
with strict one: b3368c3 - Force reading files and then process them in parallel using
Eval
monad: a1d5f56 - Force reading files and then process them using
mapConcurrently
: 8f65374
The latter two ones produce similar results.
Martoon-00 commented
Thanks for this investigation!
I tried, and from what I can see:
- Repo scanning time is not extremely different in all of these scenarios (0.9s / 0.7s / 0.5s / 0.5s)
- My impression was that in the given load testing there was simply no space for parallelization (this is what we saw on this picture. Sparks tab shows that a few sparks were bind to different cores, but most of them went to one core, probably simply because parsing was fast enough to process it all.
- I tried to create 4 dummy markdown files, 50Kb each, and sparks solution showed 4 cores being used.
(the selected area corresponds to repo scanning time)
Although I'm not exactly sure why "Activity" graph at the top shows so few CPU feed.