marbl/canu

Assembly small DNA fragments (amplicons)

lborcard opened this issue · 5 comments

Dear Canu team,

We are working with small DNA fragments (400bp > ,< 5kb)coming from amplicons and we do not always have a reference to align them and call a consensus. In certain cases we need to assemble them de novo. We have tried to assemble these small fragments with Canu but it seems that we cannot find the right parameters to obtain a correct contig. Do you know if your software can assemble small DNA fragments? If so could you recommend parameters that we should tweak to make it work?

best regards,

Loïc

Hi @lborcard
I ended up using these parameters for my low reads I published here: https://www.frontiersin.org/articles/10.3389/fmicb.2023.1248323/full

"Canu 2.2 program parameters for low coverage reads were genomeSize = 2m maxinputCoverage = 10000 corOutCoverage = 10000 corMhapSensitivity=high corMinCoverage=0 redMemory=32 oeaMemory=32 batMemory=64 minInputCoverage=0 stopOnLowCoverage=0, and genomeSize=2m maxInputCoverage=100 --nanopore fecal samplesDNA.fastq for high coverage data, respectively."

skoren commented

Generally, yes, amplicons can be assembled. I'm not sure what you mean by "the parameters to obtain a correct contig"? Did you get an assembly but it didn't match your expectation or did you not get an assembled single sequence? The FAQ has some suggestions on tweaking parameters, https://canu.readthedocs.io/en/latest/faq.html#can-i-assemble-amplicon-sequence-data. You can also try the above.

Thank you for your answer @lucyintheskyzzz , you said you used this for low coverage reads but I have the opposite since I am working with amplicons.
@skoren Sorry for beeing unclear, I was benchmarking my assembly using a ref based appraoch and the assembly was never really of the expected size . And I also obtained several contigs . I was wondering if I could also use a genomeSize below 1000 bp?
Thank you for the link, I read the parameters documentation extensively as well as your publication but I did not think of looking at the FAQ.

thank you so much

skoren commented

The genome size primarily downsamples the data and controls how much sequence is corrected so, even if your target is <1kb, leaving it at 1kb shouldn't cause it to not be assembled.

skoren commented

Idle