bioinfologics/w2rap-contigger

Speeding up loading reads into memory

sanjitsbatra opened this issue · 2 comments

Hey! I have a dataset with about 1T of reads in fastq format. This takes about a week to load into memory in the first step. Is there anyway to quicken this process?

We are working on a faster step1 version, but 1T of reads is probably going to kill other parts of the software anyway. Can you comment on genome and coverage? It is either a super challenging project that I would love to hear about or you can probably just downsample a lot...

Is there any way to parallelize this process? It would seem that one could seek blocks in parallel and load in memory, right?