3dgiordano/SARS-CoV-2-Variants

Question: What means VOC-SUM?

MarieLataretu opened this issue ยท 5 comments

Hi @3dgiordano ,

thanks for this repo!

I've discovered some VOC-SUM types in the variants.csv (all WHO). Do you know what it means?
Couldn't find something on WHO website

Thanks :)

Hi @MarieLataretu

The VOC-SUM is Subvariants Under Monitoring (SUM) behind the Variants of concern (VOC).

Recently WHO since all the sub variants were within Omicron, the title of the section changed to Omicron Subvariants under Monitoring.
Source from WHO
https://www.who.int/activities/tracking-SARS-CoV-2-variants/tracking-SARS-CoV-2-variants

Also WHO now uses the concept of old variants under consideration and new ones, I still do not make the distinction in that criterion in the generation of information.
The focus of the file you mention is about keeping track of the variants and if they have a name designated by WHO and what category they had or have.

I use that information for grouping information and giving it labels for the graphs.

I'm glad it's useful.

Many thanks - I haven't thought of subvariants ๐Ÿ˜ƒ

The focus of the file you mention is about keeping track of the variants and if they have a name designated by WHO and what category they had or have.

Yes, that's exactly how we are using this file! ๐Ÿ™

I just found out they use it at poreCov project. Awesome.

I am glad that the work of making a union between pangolin and the data of the disease control agencies helps them to find the association in a simpler way.

The problem with the Omicron SUMs is that they will be reported duplicate in the pango column.

For example BA.2.75, which is from the Omicron family and is ported as Omicron but has its own wildcard registration as VOC-SUM because it is a sub-variant under monitoring.

If duplicates occur, the concept is that for someone, there is a particular variant to consider.

One suggestion, what you can do is filter and match by interest=WHO and type in ["VOC", "VOI"], and if that lineage has a match on the difference, all non-WHO VOC VOI, highlight it indicating that it is a particular variant in monitoring by some control agency.

Same, if you don't find it within WHO VOC VOI, but if in the difference, it means that also that, the difference is that it doesn't have an official name yet.

At one time it was very common for ECDC or PHE to have their own variants not declared by the WHO, today the variant reporting system continues to take this into account, if a subvariant is reported to the ECDC to be monitored and it is not within a family WHO, is reported as something particular to monitor.

Today the WHO report is the one who wins the list, ECDC has the same variants, but I do not report it duplicated, I only report extra to the difference.
But if WHO has a special monitoring like SUM, or if another organization also has a subvariant in monitoring, I do report it duplicated.

I hope I can clarify the picture a bit and can help solve the ambiguity detection mechanism.

Any suggestion for improvement or that may be useful to you is welcome.

Yes, exactly - we use it in the report ๐Ÿ™‚

It's really hard to keep track of all the different institutions and definitions!

I'm not entirely familiar with our reporting code (https://github.com/replikation/poreCov/blob/eab1c4fe358c7e61f1a555ddd5b5677a6bb79d8f/bin/summary_report.py#L741-L777)

First, we try to get an exact lineage match, then we try two masking steps. For multiple hits, we report ambigious.

And so far, I haven't noticed ambigious annotations in our reports, so I think it matches your reporting strategy quite well!

Perfect, I'm glad you're not getting ambiguous reports so far.
Any doubt or error in the data, do not hesitate to report it in an issue.

Regards.