DerrickWood/kraken2

Why is `k2mask` set to use only half the number of available cores?

hermidalc opened this issue · 5 comments

kraken2/scripts/k2

Lines 485 to 487 in 4cbdc5f

argv = masking_binary + " -outfmt fasta -threads {} -r x".format(
multiprocessing.cpu_count() // 2
)

For kraken2 I observed that performance is best at 1/2 to 3/4 of available cores. After that performance decreases.

Not sure about if it is the same case with k2mask. Need to run verify it.

https://avilpage.com/2024/07/mastering-kraken2-performance-optimisation.html

For kraken2 I observed that performance is best at 1/2 to 3/4 of available cores. After that performance decreases.

Not sure about if it is the same case with k2mask. Need to run verify it.

https://avilpage.com/2024/07/mastering-kraken2-performance-optimisation.html

The downside to it being hardcoded like that is when you run Kraken2 in a cluster environment you don't want the script looking at all the cores available on a node because you typically request a certain number of cores to use for the job and the job could be assigned to a node with a lot more cores than what you requested (that other jobs are using).

I agree that it shouldn't be hardcoded. I am trying to understand why it could be hardcoded to multiprocessing.cpu_count() // 2 in the initial stage.

I agree that this change should be configurable via the command line. I will address this in my next commit to k2.

Raised PR for that #866