Parameter choice for big, repetitive genomes
sighe opened this issue · 1 comments
I appreciate your developing a fast and good assembler and following up the issues here. We are currently working on animal genomes of >4Gb with high abundance (>60%) of repetitive elements.
Here is our experience with the latest species of 4.7Gb genome with PacBio CLR reads. While referring to the issue #218 , we have tried out the parameter sets '-l 6000 -m 200' and '-l 6000 -m 600' in addition to the default. The results did not differ that much but the both runs with '-l 6000' resulted in larger total assembly size by 0.2Gb, probably as expected.
Do you have any recommendation in parameter setting for such large, repetitive genomes? '-R -s' and '--aln-dovtail -1' like you suggested in #218 ? Is there any recommended value for '-s' in particular?
Add -R
, try --aln-dovetail -1
or --aln-dovetail 1024
, also -l 6000
. You can fast load alignments by --load-alignments
, then increase -s
and -l
to build assembly graph.