src/long-to-linked-pe v1.0: Using more than 6 threads does not scale, reverting to 6.
mmokrejs opened this issue · 5 comments
The README.md claims one can run tigmint with any number of threads.
It turn out the maximum at least in some step is just 6:
long-to-linked-pe v1.0: Using more than 6 threads does not scale, reverting to 6.
Please update the t=8: Number of threads
line in README.md accordingly explaining what steps will not use multiple CPUs. I guess one still want to use for for the bwa
step but who knows.
Hi @mmokrejs,
As Tigmint is a pipeline including multiple steps, it is true that some but not all steps use threads, and some to varying degrees. The most costly part of Tigmint is the alignment/mapping step, so I prefer to keep the default threads as is. It makes sense to use more threads to have the bwa/minimap2 steps as fast as possible.
If it's helpful, I can add a note that long-to-linked-pe
uses up to 6 threads only, but I'm hesitant overcomplicate things in the README by talking about thread usage in every individual step in the pipeline, since our recommendation for thread usage would stay the same. I don't think this is uncommon in bioinformatics tools (only the more costly steps being multi-threaded) - for examples, not all steps in ABySS are multi-threaded.
Thanks,
Lauren
Hi @lcoombe,
Especially expensive are the analyses when user allocates a hundred of CPUs but the pipeline spends insane amount of time running single-threaded. And that is why I proposed to improve the documentation. It is not clear how to run the pipeline efficiently. Run step1 using X threads, then alignment step with t=8
or whatever, then a follow-up long-to-linked-pe
with t=6
as it does not scale.
I was not asking for rewriting your code, just explaining the user which parts of the pipeline can take advantage of multiple CPUs, whcih don't and how to run then as separate jobs not to waste CPU cycles.
I do not know how bout you but our cluster supervision jobs complain if a jobs uses only 1% of the allocated CPUs. I cannot justify that either. Then it makes me upset if the README seems I can pass t=288
to the pipeline just to learn later
long-to-linked-pe v1.0: Using more than 6 threads does not scale, reverting to 6.
which I should have been told in the README right away. Please do something about that. We need it for planning a computational project resources, and bgsc can write well-scaling software.
Just to better understand your concerns, would you want to run individual steps of tigmint (separate or in the Makefile), specifying different threads for each? Ie. launching independent, consecutive jobs to your cluster, requesting different amounts of resources?
In our experience, the more common case for a user is to just want to run Tigmint start to finish, without having to run it piece-meal, but I understand circumstances are different for different users.
If you do want to vary the threads per job, I think the easiest thing would be to see what commands will be run by Tigmint (specifying -n
to your job), then you can do whatever you want with those commands. Ie. For tigmint
, you could specify 12 threads for bwa mem
, but then tell your scheduler that you only require 1 thread for tigmint-molecule
and tigmint-cut
.
I have added a note about the thread usage for long-to-linked-pe
in the README (01a87de)
Just to better understand your concerns, would you want to run individual steps of tigmint (separate or in the Makefile), specifying different threads for each? Ie. launching independent, consecutive jobs to your cluster, requesting different amounts of resources?
Yes, exactly.
If you do want to vary the threads per job, I think the easiest thing would be to see what commands will be run by Tigmint (specifying
-n
to your job), then you can do whatever you want with those commands. Ie. Fortigmint
, you could specify 12 threads forbwa mem
, but then tell your scheduler that you only require 1 thread fortigmint-molecule
andtigmint-cut
.
Well one could then re-write every Makefile. I am after make targets like (roughly speaking):
tigmint index t=1
tigmint align t=288
tigmint long-link t=6
I have added a note about the thread usage for
long-to-linked-pe
in the README (01a87de)
Excellent.