nextstrain/forecasts-ncov

Country include description out of sync?

Opened this issue · 2 comments

Current Behavior

The website currently states for clade frequencies:

Only locations with more than 100 sequences from samples collected in the previous 150 days are included.

image

We show the following countries:

  • Australia
  • Belgium
  • Canada
  • China
  • Denmark
  • Finland
  • France
  • Germany
  • Iceland
  • Ireland
  • Italy
  • Japan
  • Netherlands
  • Singapore
  • South Korea
  • Spain
  • Switzerland
  • Sweden
  • USA
  • UK

This doesn't seem to be correct, or at least missing important context, as when I look for countries with more than 100 sequences with collection date <150 days ago on covSpectrum (https://cov-spectrum.org/explore/World/AllSamples/from%3D2023-07-02%26to%3D2023-11-22/variants/international-comparison?&) I get the following countries:

Country Total Variant Sequences First seq. found at Last seq. found at
United States 99114 2023-26 2023-47
Canada 33981 2023-26 2023-46
United Kingdom 22897 2023-26 2023-46
Japan 22771 2023-26 2023-45
South Korea 18858 2023-26 2023-45
France 17394 2023-26 2023-46
Spain 14246 2023-26 2023-46
China 13271 2023-26 2023-46
Australia 7386 2023-26 2023-46
Sweden 6758 2023-26 2023-47
Italy 5333 2023-26 2023-47
Denmark 4696 2023-27 2023-46
Singapore 4517 2023-26 2023-44
Germany 3514 2023-26 2023-46
Netherlands 3139 2023-26 2023-46
Belgium 3077 2023-26 2023-47
Brazil 2781 2023-26 2023-45
New Zealand 2668 2023-26 2023-43
Israel 2617 2023-26 2023-45
Greece 2469 2023-27 2023-40
Ireland 2343 2023-26 2023-47
Russia 1963 2023-26 2023-44
Switzerland 1916 2023-27 2023-46
Finland 1668 2023-26 2023-45
Austria 1411 2023-27 2023-46
Peru 1254 2023-26 2023-43
Luxembourg 1213 2023-27 2023-43
Portugal 1198 2023-27 2023-45
Mexico 1074 2023-26 2023-42
Croatia 858 2023-27 2023-43
Chile 787 2023-27 2023-43
Thailand 773 2023-26 2023-43
Slovenia 752 2023-26 2023-42
Iceland 676 2023-27 2023-46
Colombia 653 2023-26 2023-43
Ukraine 652 2023-27 2023-44
Taiwan 581 2023-26 2023-45
South Africa 493 2023-27 2023-41
Turkey 465 2023-28 2023-40
Poland 459 2023-28 2023-45
Norway 441 2023-26 2023-44
Romania 364 2023-27 2023-40
Argentina 359 2023-26 2023-38
Malaysia 359 2023-26 2023-43
Costa Rica 341 2023-26 2023-43
Guatemala 321 2023-27 2023-40
India 285 2023-26 2023-44
Georgia 272 2023-27 2023-40
Mauritius 270 2023-27 2023-44
Bulgaria 254 2023-27 2023-43
Dominican Republic 200 2023-27 2023-35

Expected behavior

Brazil | 2781 | 2023-26 | 2023-45
New Zealand | 2668 | 2023-26 | 2023-43
Israel | 2617 | 2023-26 | 2023-45
Greece | 2469 | 2023-27 | 2023-40
Russia | 1963 | 2023-26 | 2023-44
Austria | 1411 | 2023-27 | 2023-46
Peru | 1254 | 2023-26 | 2023-43
Luxembourg | 1213 | 2023-27 | 2023-43
Portugal | 1198 | 2023-27 | 2023-45
Mexico | 1074 | 2023-26 | 2023-42
Croatia | 858 | 2023-27 | 2023-43
Chile | 787 | 2023-27 | 2023-43
Thailand | 773 | 2023-26 | 2023-43
Slovenia | 752 | 2023-26 | 2023-42
Colombia | 653 | 2023-26 | 2023-43
Ukraine | 652 | 2023-27 | 2023-44
Taiwan | 581 | 2023-26 | 2023-45
South Africa | 493 | 2023-27 | 2023-41
Turkey | 465 | 2023-28 | 2023-40
Poland | 459 | 2023-28 | 2023-45
Norway | 441 | 2023-26 | 2023-44
Romania | 364 | 2023-27 | 2023-40
Argentina | 359 | 2023-26 | 2023-38
Malaysia | 359 | 2023-26 | 2023-43
Costa Rica | 341 | 2023-26 | 2023-43
Guatemala | 321 | 2023-27 | 2023-40
India | 285 | 2023-26 | 2023-44
Georgia | 272 | 2023-27 | 2023-40
Mauritius | 270 | 2023-27 | 2023-44
Bulgaria | 254 | 2023-27 | 2023-43
Dominican Republic | 200 | 2023-27 | 2023-35

Notably, we include Iceland with only 700 sequences but exclude Brazil with 2500

I think the text is wrong, as the config shows:

        location_min_seq: 100
        location_min_seq_days: 30

So in reality, to be included, a location needs 100 sequences within 30 days of today. Would be good to relax this I think. Recent data is not the most important criterion. Some countries just don't have recent data, that doesn't mean they shouldn't be included if they have slightly more delayed data. So I think location_min_seq_days should be increased to something like 60 days at least.

In addition, the website/html should pull the description from the config file and not hard code so that doc and code are automatically synced.

These are the force excluded countries:
Austria
Czech Republic
Lithuania
Luxembourg
Slovakia

Not sure why we'd force exclude Czechia with 10m people but not force exclude Iceland with ~100-200k.