cov-lineages/scorpio

Pangolin v1.12 as used by GISAID misclassifies a lot of true BA.5* as BA.2*

corneliusroemer opened this issue · 2 comments

Using covSpectrum's advanced queries, I've noticed that the pango assignments that come from GISAID are quite often wrong. I think GISAID still uses pangoLEARN as opposed to Usher. They say they are using designation version 1.12

In Poland as much as 30% of sequences are misclassified BA.2 even though they are true BA.5. In Germany around 5% are misclassified.

Is this due to Scorpio or pangoLEARN?

Something I noticed when looking at a sample of misassigned sequences is that many of them miss the RBD - but that shouldn't stop pangoLEARN/Scorpio from being confident that (most of) these are true BA.5

Here's the full list of sequences that GISAID calls BA.2* but that are BA.5* by Nextclade: https://lapis.cov-spectrum.org/gisaid/v1/sample/gisaid-epi-isl?region=Europe&dateFrom=2022-05-09&variantQuery=nextcladePangoLineage%3ABA.5*++%26+BA.2*&host=Human&accessKey=9Cb3CqmrFnVjO3XCxQLO6gUnKPd&orderBy=random

Here's a sample screenshot from Nextclade showing the RBD region:
image

Query: (https://cov-spectrum.org/explore/Europe/AllSamples/Past3M/variants?variantQuery=nextcladePangoLineage%3ABA.5*++%26+BA.2*&aaMutations1=S%3A346&pangoLineage1=BA.5*&)
image

image

I ran locally and this seems to be a Scorpio issue:
image

It's been flagged to GISAID that the mode used should be the default UShER mode, which no longer gets overwritten by scorpio. With the assignment cache @AngieHinrichs prepares it should be fast enough for all purposes. I'd reccommend running in usher mode if you're seeing these misassignments!
If you want to do a constellation PR for the scorpio issue happy to take a look!