[Question] Split => correct => re-join

Question

[Question] Split => correct => re-join

tuannguyen8390 opened this issue 3 years ago · 2 comments

Hi there,

I was wondering if it is feasible to split the long read file into smaller chunks. Then in parallel load each chunk with the index. Finally merge them back later to speed up flmrc2 ?

Many thanks,

Tuan

Answer 1 · 2022-03-29T15:18:42.000Z

Yes, we've actually done this before when we had a very, very large number of long reads (i.e. even multi-processing on a single machine was too slow). You can do this because each long read is corrected independently from all other long reads. You can follow this general process:

Create BWT from short reads (no way to further parallelize this step)
Split long read FASTX file into multiple smaller FASTX files
Run correction on each smaller FASTX file
Merge the small FASTX files back into a single result

Answer 2 · 2022-03-31T12:02:28.000Z

Thanks ! It's good to have some confirmation that this would work in real scenario.