mbhall88/rasusa

Multi-threading approach implementation

Closed this issue · 3 comments

Hi @mbhall88,

Thanks for working on {rasusa} it's been very helpful. I was playing with it to generate multiple subsamples from one fq file and wonder if there is a way to implement a multi-threading approach for working locally or in a cluster. Maybe using {parallel}?

Cheers,
Camilo.

Hi Camilo,

Glad you find rasusa useful.

So, to clarify, are you saying you would like to be able to specify something like a list of coverages and a single fastq and have rasusa produce as many fastq files as there are coverages in the list?

If that is correct, then you could quite easily do some bash parallel runs by sending runs to their own forked process. See here for a nice explanation. Stealing an example from there, you could do something like the following, using 4 processes (so if you're on a big cluster you could do more)

processes=4
declare -a coverages=(10 30 50 60 100 150)
(
for covg in "${coverages[@]}"
do 
   ((i=i%processes)); ((i++==0)) && wait
   rasusa -g 4mb -c "$covg" -i in.fq -o "out.${covg}x.fq" 2> "${covg}x.log" & 
done
)

Apologies if this isn't what you're asking and I have just gone off on a tangent.

Hi Micheal,

That was a pretty elegant solution and I think it solves my problem, thank you. However, for the sake of clarifying my point above, I was wondering if it was possible and worth it to make an additional option for rasusa, like -t for threads.

Anyway, your solution is enough.
Camilo.

Hi @camilogarciabotero ,

Given there are simple, elegant native solutions available in the shell to do this kind of thing I don't feel there is a necessity for adding a multi-threaded option. However, if there is enough support from users that this is an option that would make everyone's life easier than I am happy to reconsider.

Closing this for now. Please reopen if that solution doesn't work, or if you think this warrants further discussion.