DessimozLab/read2tree

[Question] Are very short insert sizes an issue?

mptrsen opened this issue · 5 comments

I'm going to process a bunch of sequencing datasets from museum samples, some stored under horrendous conditions and for decades. As a result, the DNA fragments are very short, often shorter than a read length. So we will have many read pairs that overlap partly or fully.

FL_plots

Is this an issue for the local assembly/will read2tree work with such samples?

alpae commented

Dear @mptrsen

we don't have experience with such degenerated samples, but I suggest to just give it a try. In our experience, the coverage needed is very low, so I would hope that read2tree can deal with this.

Yes, coverage won't be an issue, just the fragmentation. I understand from the read2tree paper that the "local assembly" step consists in taking the consensus sequence from the mapped reads. So the insert size should be irrelevant to the method, because there is no "traditional" assembler in the pipeline that would use the insert size metric somehow. Is that correct?

Thanks. We also have a step to map the sequencing reads onto the orthologous groups using the nextgenmap software for short reads. You can check its wiki. We could provide you with some adjustment by changing some parts of our code to benefit from NGM arugments.

@sinamajidian Thanks for the info. Since read2tree uses neither the -I/ --min-insert-size nor the -X/ --max-insert-size option for NGM, I assume that it runs with the defaults, i.e. insert size between 0 and 1000 bp.

Out of interest: Why use nextgenmap instead of a different mapper such as BWA, Bowtie, or bbmap? (There was little detail about the methods in the paper.)

You're welcome.

The project started a few years ago, and NGM was among the best at the time. Besides, Fritz in the read2tree team was among the NGM's developers.

Best regards,
Sina