BenBeresfordJones/MGBC

Genome statistics in Supplementary Table 4 are incorrect

Closed this issue · 2 comments

Hi,

Some values in Supplementary Table 4 are incorrect.

Below are some examples:

MMGC127106, reported values are
Length = 4508422
N50 = 41748
N_contigs = 199

The correct values are:
zcat mmgc_5/MMGC127106.fna.gz | assembly-stats /dev/stdin
stats for /dev/stdin
sum = 3307785, n = 63, ave = 52504.52, largest = 216609
N50 = 79410, n = 13
N60 = 59362, n = 17
N70 = 47262, n = 24
N80 = 42134, n = 31
N90 = 30264, n = 40
N100 = 2816, n = 63
N_count = 0
Gaps = 0

MMGC120619
Length = 2537282
N50 = 65849
N_contigs = 59

The correct values are:
zcat mmgc_3/MMGC120619.fna.gz | assembly-stats /dev/stdin
stats for /dev/stdin
sum = 3841386, n = 56, ave = 68596.18, largest = 313672
N50 = 107840, n = 11
N60 = 87019, n = 15
N70 = 69000, n = 20
N80 = 58489, n = 26
N90 = 42974, n = 33
N100 = 3528, n = 56
N_count = 910
Gaps = 10

Could you fix this?

Thanks,
Florian

Hi Florian,

I am currently working on finalising the next version of the MMGC following reviewer comments on our preprint manuscript. I will ensure that the genome statistics are correct in this upcoming version.

Kind regards,
Ben

Hi Florian,

I have looked into this as part of the latest release. All genome statistics are generated using CheckM, and I have corroborated them with statistics generated using the independent assembly_stats package. Although there is some variation in the metrics produced using different tools (I assume due to differences in the calculations used?), I did not find any of the large differences in N50 and other metrics that you had previously reported.

In addition, looking at your comment again now, you might have been reporting the scaffold assembly metrics for the genomes, rather than the contig-level statistics.

Best wishes,
Ben