Multi-threading approach

Question

Multi-threading approach

Teklu67 opened this issue 3 years ago · 8 comments

Hi,
This is a very useful program but it is taking long time to sub-sample from a large fastq file. I am running it on a server and would like to run it using multi-threading but I am novice to programming and not sure how to do that. Any help please?
Thanks,

Answer 1 · 2022-03-29T01:01:27.000Z

Hi @Teklu67. When you say "a long time", how long are we talking? And how large is your file?

Answer 2 · 2022-03-29T02:43:53.000Z

Thanks so much for the quick response. It finished sampling 30x from a fq of 690 Gb (60x coverage) in 2 days. Because I have the resources to run using several threads I thought it will finish much faster if there was an option for multi-threading. Thanks!

Answer 3 · 2022-03-29T23:47:43.000Z

Wow, that's a very big fastq file! Is it compressed (e.g., gzip)?

How did you install rasusa?

Answer 4 · 2022-04-01T05:25:02.000Z

Yes it is for tetraploid wheat and compressed .gz format. I installed it through conda.

Answer 5 · 2022-04-02T00:05:25.000Z

Is your data Illumina?

There's not really too much I can offer in the way of speeding rasusa up sorry.

At some point I will look into whether multi-threading the IO is possible (i.e. batching reads).

I'll leave this open and add it to my list of things to investigate in the coming months. Sorry, I can't do it faster, but have a lot of other research projects I am trying to juggle.

However, if you (or anyone else) would like to have a go at it, I would be very happy to receive a pull request.

Answer 6 · 2022-04-05T15:48:28.000Z

It is ONT data. That is ok, thank you for your time

Answer 7 · 2022-04-06T00:01:43.000Z

In the mean time, I would suggest maybe trying to split the file up into subsets, and then randomly subsample each subset.

Answer 8 · 2024-11-26T11:49:31.000Z

Another suggestion: I suspect most of the runtime is (de)compressing the data. Switching to zstd instead of gzip should drastically improve time spent on decompression