smaegol/PlasFlow

batch_size advice

dpellow opened this issue · 3 comments

re the batch_size option, I found the opposite of the instructions to be true - the larger the batch size the more likely it is to work for a large dataset.

Interesting, I would not expect this. Can you comment further on this (what is the size of your dataset and how you tested)?

It is a simulated dataset of 50M reads, around 400k contigs in the assembly (I don't do length filtering). Using larger batch size creates fewer batches which seems to be more memory efficient than having more, smaller batches.

OK, but I think now you wrote about something different than before. Of course larger batch size will be more memory efficient (to some point when it will crash due to Biostrings problem). But smaller batch size should not cause program to fail, what can be supposed based on your first comment.