yechengxi/DBG2OLC

multithreading

janvanoeveren opened this issue · 9 comments

Hi, I was just wondering if there's a multiple threading option for DBG2OLC - cause it seems to be quite slow and takes only 5% cpu on a multicore server ....
What am I missing?

DBG2OLC was implemented as a single thread program. However, if you have a lot of computational resources, the way to get it faster is to split your 3GS reads into different batches and Run DBG2OLC on each of them.

DBG2OLC will compute the compressed reads and try to assemble. You may stop your program anytime once the compression is complete.

In a final call of the program, include all the files and use LD 1 to load all the precomputed compressed reads to assemble.

As long as you use the same parameters and the contigs, the k-mer analysis will produce the same result.
The most time consuming step is to calculate the compressed reads.
When you use 'LD 1' and include the full set of reads, DBG2OLC will load all the precomputed compressed reads and recompute the overlaps. The previous overlaps computed with each subset are discarded.

When we assemble the human genome, we splitted the pacbio reads into a few batches and put in separate folders.
The same set of parameters and Illumina contigs are used to generate the k-mer index and compressed reads.
Then we move all the compressed reads into one folder.
And call DBG2OLC again with LD 1 and feed all the PacBio reads in the command.

I have summarized the procedure in the project page. This is a very good question.

Actually, the k-mer analysis of the contigs takes a long time for my data set (~ 2 days), so this is really a pity doing this for all PacBio subsets. Maybe you could change this to optionally taking the ContigKmerIndex_HT as input?

That's also possible, there is another undocumented option.
You can use 'LD0 1' to load that.

Thanks - I guess I then have to copy the "ContigKmerIndex_HT_content" and "ContigKmerIndex_HT_idx.txt" files to the working directory? And not specify the Contigs parameter?

[2]- Segmentation fault (core dumped) /opt/kgapps/DBG2OLC-20170411/DBG2OLC AdaptiveTh 0.001 KmerCovTh 2 MinOverlap 10 LD0 1 f ../input_PacBio/sra_data.fastq > DBG2OLC_test_LD0.log

... any clue?

You will need to feed the contigs file as usual.