Country include description out of sync?
Opened this issue · 2 comments
Current Behavior
The website currently states for clade frequencies:
Only locations with more than 100 sequences from samples collected in the previous 150 days are included.

We show the following countries:
- Australia
- Belgium
- Canada
- China
- Denmark
- Finland
- France
- Germany
- Iceland
- Ireland
- Italy
- Japan
- Netherlands
- Singapore
- South Korea
- Spain
- Switzerland
- Sweden
- USA
- UK
This doesn't seem to be correct, or at least missing important context, as when I look for countries with more than 100 sequences with collection date <150 days ago on covSpectrum (https://cov-spectrum.org/explore/World/AllSamples/from%3D2023-07-02%26to%3D2023-11-22/variants/international-comparison?&) I get the following countries:
Country | Total Variant Sequences | First seq. found at | Last seq. found at |
---|---|---|---|
United States | 99114 | 2023-26 | 2023-47 |
Canada | 33981 | 2023-26 | 2023-46 |
United Kingdom | 22897 | 2023-26 | 2023-46 |
Japan | 22771 | 2023-26 | 2023-45 |
South Korea | 18858 | 2023-26 | 2023-45 |
France | 17394 | 2023-26 | 2023-46 |
Spain | 14246 | 2023-26 | 2023-46 |
China | 13271 | 2023-26 | 2023-46 |
Australia | 7386 | 2023-26 | 2023-46 |
Sweden | 6758 | 2023-26 | 2023-47 |
Italy | 5333 | 2023-26 | 2023-47 |
Denmark | 4696 | 2023-27 | 2023-46 |
Singapore | 4517 | 2023-26 | 2023-44 |
Germany | 3514 | 2023-26 | 2023-46 |
Netherlands | 3139 | 2023-26 | 2023-46 |
Belgium | 3077 | 2023-26 | 2023-47 |
Brazil | 2781 | 2023-26 | 2023-45 |
New Zealand | 2668 | 2023-26 | 2023-43 |
Israel | 2617 | 2023-26 | 2023-45 |
Greece | 2469 | 2023-27 | 2023-40 |
Ireland | 2343 | 2023-26 | 2023-47 |
Russia | 1963 | 2023-26 | 2023-44 |
Switzerland | 1916 | 2023-27 | 2023-46 |
Finland | 1668 | 2023-26 | 2023-45 |
Austria | 1411 | 2023-27 | 2023-46 |
Peru | 1254 | 2023-26 | 2023-43 |
Luxembourg | 1213 | 2023-27 | 2023-43 |
Portugal | 1198 | 2023-27 | 2023-45 |
Mexico | 1074 | 2023-26 | 2023-42 |
Croatia | 858 | 2023-27 | 2023-43 |
Chile | 787 | 2023-27 | 2023-43 |
Thailand | 773 | 2023-26 | 2023-43 |
Slovenia | 752 | 2023-26 | 2023-42 |
Iceland | 676 | 2023-27 | 2023-46 |
Colombia | 653 | 2023-26 | 2023-43 |
Ukraine | 652 | 2023-27 | 2023-44 |
Taiwan | 581 | 2023-26 | 2023-45 |
South Africa | 493 | 2023-27 | 2023-41 |
Turkey | 465 | 2023-28 | 2023-40 |
Poland | 459 | 2023-28 | 2023-45 |
Norway | 441 | 2023-26 | 2023-44 |
Romania | 364 | 2023-27 | 2023-40 |
Argentina | 359 | 2023-26 | 2023-38 |
Malaysia | 359 | 2023-26 | 2023-43 |
Costa Rica | 341 | 2023-26 | 2023-43 |
Guatemala | 321 | 2023-27 | 2023-40 |
India | 285 | 2023-26 | 2023-44 |
Georgia | 272 | 2023-27 | 2023-40 |
Mauritius | 270 | 2023-27 | 2023-44 |
Bulgaria | 254 | 2023-27 | 2023-43 |
Dominican Republic | 200 | 2023-27 | 2023-35 |
Expected behavior
Brazil | 2781 | 2023-26 | 2023-45
New Zealand | 2668 | 2023-26 | 2023-43
Israel | 2617 | 2023-26 | 2023-45
Greece | 2469 | 2023-27 | 2023-40
Russia | 1963 | 2023-26 | 2023-44
Austria | 1411 | 2023-27 | 2023-46
Peru | 1254 | 2023-26 | 2023-43
Luxembourg | 1213 | 2023-27 | 2023-43
Portugal | 1198 | 2023-27 | 2023-45
Mexico | 1074 | 2023-26 | 2023-42
Croatia | 858 | 2023-27 | 2023-43
Chile | 787 | 2023-27 | 2023-43
Thailand | 773 | 2023-26 | 2023-43
Slovenia | 752 | 2023-26 | 2023-42
Colombia | 653 | 2023-26 | 2023-43
Ukraine | 652 | 2023-27 | 2023-44
Taiwan | 581 | 2023-26 | 2023-45
South Africa | 493 | 2023-27 | 2023-41
Turkey | 465 | 2023-28 | 2023-40
Poland | 459 | 2023-28 | 2023-45
Norway | 441 | 2023-26 | 2023-44
Romania | 364 | 2023-27 | 2023-40
Argentina | 359 | 2023-26 | 2023-38
Malaysia | 359 | 2023-26 | 2023-43
Costa Rica | 341 | 2023-26 | 2023-43
Guatemala | 321 | 2023-27 | 2023-40
India | 285 | 2023-26 | 2023-44
Georgia | 272 | 2023-27 | 2023-40
Mauritius | 270 | 2023-27 | 2023-44
Bulgaria | 254 | 2023-27 | 2023-43
Dominican Republic | 200 | 2023-27 | 2023-35
Notably, we include Iceland with only 700 sequences but exclude Brazil with 2500
I think the text is wrong, as the config shows:
location_min_seq: 100
location_min_seq_days: 30
So in reality, to be included, a location needs 100 sequences within 30 days of today. Would be good to relax this I think. Recent data is not the most important criterion. Some countries just don't have recent data, that doesn't mean they shouldn't be included if they have slightly more delayed data. So I think location_min_seq_days
should be increased to something like 60 days at least.
In addition, the website/html should pull the description from the config file and not hard code so that doc and code are automatically synced.
These are the force excluded countries:
Austria
Czech Republic
Lithuania
Luxembourg
Slovakia
Not sure why we'd force exclude Czechia with 10m people but not force exclude Iceland with ~100-200k.