Overdiverged and potentially misdated sequences from Italy (Sicily/Campania)
corneliusroemer opened this issue · 1 comments
Release date: 2024-01-15
Submitter: Viscardi et al.,
Submitting institution: Istituto Zooprofilattico Sperimentale del Mezzogiorno, Hanimal Health
Country: Italy (Sicily/Campania)
NCBI virus link: https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType_s=Nucleotide&VirusLineage_ss=Monkeypox%20virus,%20taxid:10244&Authors_idx%20q.op%3DAND=viscardi&CreateDate_dt=2024-01-09T00:00:00.00Z%20TO%202024-01-20T23:59:59.00Z
Example Genbank: https://www.ncbi.nlm.nih.gov/nuccore/PP098578
Status: Submitter has been contacted (2024-01-19)
List of Genbank accessions
PP098578
PP098579
PP098580
PP098581
PP098582
PP098583
PP098584
PP098585
PP098586
PP098587
PP098588
PP098589
PP098590
PP098591
PP098592
PP098593
PP098594
PP098595
PP098596
PP098597
PP098598
PP098599
PP098600
PP098601
PP098602
PP098603
PP098604
PP098605
PP098606
PP098607
PP098608
PP098609
PP098610
PP098611
PP098612
PP098613
PP098614
PP098615
PP098616
PP098617
PP098618
PP098619
I noticed the following potential QC issues with the 42 sequences submitted by Viscardi et al. when compared against the B.1 outbreak reference:
- 13 sequences are very overdiverged, with 45 to 1450 SNPs
- Almost all sequences have frameshifts: OPG029:152-156, OPG047:477-483 caused by a single insertion
- All dates are given as 2023-05 which is surprising for the following reasons:
- It is long past the epidemic peak in Europe
- It seems unlikely that the lab would sequence so many samples from the same month and none for others
- Some sequences cluster have very few if any SNPs compared to other sequences from Europe that were collected in summer 2022, suggesting these sequences are also from summer 2022
It's very much possible that the dates are correct as it turned out in #2, the cluster might shown above might be a similar case to that issue.