[Question] Split => correct => re-join
tuannguyen8390 opened this issue · 2 comments
tuannguyen8390 commented
Hi there,
I was wondering if it is feasible to split the long read file into smaller chunks. Then in parallel load each chunk with the index. Finally merge them back later to speed up flmrc2 ?
Many thanks,
Tuan
holtjma commented
Yes, we've actually done this before when we had a very, very large number of long reads (i.e. even multi-processing on a single machine was too slow). You can do this because each long read is corrected independently from all other long reads. You can follow this general process:
- Create BWT from short reads (no way to further parallelize this step)
- Split long read FASTX file into multiple smaller FASTX files
- Run correction on each smaller FASTX file
- Merge the small FASTX files back into a single result
tuannguyen8390 commented
Thanks ! It's good to have some confirmation that this would work in real scenario.