--polish-target has reduced N50 and largest contig
Closed this issue · 2 comments
Hello
Firstly thanks for making this great tool.
I have usedFlye 2.9.3-b1797
to assemble ONT reads for a plant genome (assembly size 830 Mb). I installed Flye using bioconda. Before running Flye I removed reads shorter than 5kb.
flye --nano-hq /u/project/vlsork/ldpeck/longreads/fastq/${INFILE%_*}_ALLpass.fl5kb.fastq.gz \
--genome-size 830m -o flye-hq-${INFILE%_*} -t 7 --scaffold
Assembly assembly
# contigs (>= 0 bp) 7408
# contigs (>= 1000 bp) 7407
# contigs (>= 5000 bp) 7381
# contigs (>= 10000 bp) 7327
# contigs (>= 25000 bp) 6986
# contigs (>= 50000 bp) 6169
Total length (>= 0 bp) 2947843323
Total length (>= 1000 bp) 2947842862
Total length (>= 5000 bp) 2947760873
Total length (>= 10000 bp) 2947336694
Total length (>= 25000 bp) 2941151963
Total length (>= 50000 bp) 2910673812
# contigs 7395
Largest contig 7272865
Total length 2947819947
GC (%) 35.47
N50 821055
N90 189133
auN 1272818.7
L50 922
L90 3789
# N's per 100 kbp 0.47
Then I ran --polish-target
with two iterations
flye --polish-target flye-hq-${INFILE%_*}/assembly.fasta \
--nano-hq /u/project/vlsork/ldpeck/longreads/fastq/${INFILE%_*}_ALLpass.fl5kb.fastq.gz \
--iterations 2 --threads 7
Assembly polished_2
# contigs (>= 0 bp) 7173
# contigs (>= 1000 bp) 7121
# contigs (>= 5000 bp) 6960
# contigs (>= 10000 bp) 6656
# contigs (>= 25000 bp) 5754
# contigs (>= 50000 bp) 4806
Total length (>= 0 bp) 1557125882
Total length (>= 1000 bp) 1557095537
Total length (>= 5000 bp) 1556616863
Total length (>= 10000 bp) 1554379940
Total length (>= 25000 bp) 1538848805
Total length (>= 50000 bp) 1504360171
# contigs 7034
Largest contig 5266179
Total length 1556930564
GC (%) 35.44
N50 487344
N90 108678
auN 780367.3
L50 811
L90 3461
# N's per 100 kbp 0.00
You can see that the polishing improved the number of N's and reduced total number of contigs, but the N50 and largest contig have both decreased? I have attached both flye log files from the original assembly step (flye.log) and from the polishing step (flye_polish.log)
Do you know why this might be?
Thanks
Lily
Hi Lily,
Total length has reduced quite a bit - this is unexpected. I think it may have something to do with scaffolding. If you want to add additional polishing iterations, you can use -i
argument during the assmebly, it runs polishing on contigs, rather than scaffolds. With new ONT data 1 round of polishing is usually sufficient.
Thank you, I think you were right about the scaffolding flag. I was also surprised by the total lengths above, as the assembly size is 830 Mb, so the assembly had roughly tripled in size. Running the below script I now have a more expected value for total length.
Thanks
Lily
flye --nano-hq /u/project/vlsork/ldpeck/longreads/fastq/${INFILE%_*}_ALLpass.fl5kb.fastq.gz \
--genome-size 830m -o flye-hq-${INFILE%_*} -t 7 --iterations 1
Assembly assembly
# contigs (>= 0 bp) 7036
# contigs (>= 1000 bp) 6994
# contigs (>= 5000 bp) 6783
# contigs (>= 10000 bp) 6142
# contigs (>= 25000 bp) 4686
# contigs (>= 50000 bp) 3536
Total length (>= 0 bp) 995236639
Total length (>= 1000 bp) 995209587
Total length (>= 5000 bp) 994556442
Total length (>= 10000 bp) 989618918
Total length (>= 25000 bp) 965235033
Total length (>= 50000 bp) 924011244
# contigs 6900
Largest contig 5407364
Total length 995031275
GC (%) 35.43
N50 368393
N90 68235
auN 660728.6
L50 637
L90 3052
# N's per 100 kbp 0.00