ruanjue/wtdbg2

Parameter choice for big, repetitive genomes

sighe opened this issue · 1 comments

sighe commented

I appreciate your developing a fast and good assembler and following up the issues here. We are currently working on animal genomes of >4Gb with high abundance (>60%) of repetitive elements.

Here is our experience with the latest species of 4.7Gb genome with PacBio CLR reads. While referring to the issue #218 , we have tried out the parameter sets '-l 6000 -m 200' and '-l 6000 -m 600' in addition to the default. The results did not differ that much but the both runs with '-l 6000' resulted in larger total assembly size by 0.2Gb, probably as expected.

Do you have any recommendation in parameter setting for such large, repetitive genomes? '-R -s' and '--aln-dovtail -1' like you suggested in #218 ? Is there any recommended value for '-s' in particular?

Add -R, try --aln-dovetail -1 or --aln-dovetail 1024, also -l 6000. You can fast load alignments by --load-alignments, then increase -s and -l to build assembly graph.