HudsonAlpha/fmlrc2

[Question] Split => correct => re-join

tuannguyen8390 opened this issue · 2 comments

Hi there,

I was wondering if it is feasible to split the long read file into smaller chunks. Then in parallel load each chunk with the index. Finally merge them back later to speed up flmrc2 ?

Many thanks,

Tuan

Yes, we've actually done this before when we had a very, very large number of long reads (i.e. even multi-processing on a single machine was too slow). You can do this because each long read is corrected independently from all other long reads. You can follow this general process:

  1. Create BWT from short reads (no way to further parallelize this step)
  2. Split long read FASTX file into multiple smaller FASTX files
  3. Run correction on each smaller FASTX file
  4. Merge the small FASTX files back into a single result

Thanks ! It's good to have some confirmation that this would work in real scenario.