J535D165/CoronaWatchNL

Accordance with map/data on the RIVM website

Tablel352 opened this issue · 2 comments

Should this data - https://raw.githubusercontent.com/J535D165/CoronaWatchNL/master/data-geo/data-municipal/RIVM_NL_municipal.csv - be in accordance with this map/data on the RIVM website (https://www.rivm.nl/coronavirus-covid-19/actueel#!node-coronavirus-covid-19-meldingen? Or is it known that it is not and the differences can be explained?

If I take for example Amsterdam (Gemeentecode 363) in the period 1 april 2020 untill 14 april 2020 then:
RIVM map says: 183 ziekenhuisopnames, 84 overleden
The CSV data from CoronawatchNL says: 157 ziekenhuisopnames, 74 overleden

Dear @Tablel352, sorry for the late response.

Unfortunately, the data at the website doesn't match the data of their own API. https://data.rivm.nl/covid-19/COVID-19_aantallen_gemeente_cumulatief.csv. I'm clueless on the reason. I haven't seen any reason to doubt the API so far. The numbers are quite similar with the daily reported numbers (back in the days).

This is a prove that our data matches the API:

import pandas as pd

url = "https://data.rivm.nl/covid-19/COVID-19_aantallen_gemeente_cumulatief.csv"

df = pd.read_csv(url, sep=";")
df[df["Municipality_name"] == "Amsterdam"].iloc[18:35]
            Date_of_report Municipality_code Municipality_name       Province  \
6715   2020-03-31 10:00:00            GM0363         Amsterdam  Noord-Holland   
7082   2020-04-01 10:00:00            GM0363         Amsterdam  Noord-Holland   
7449   2020-04-02 10:00:00            GM0363         Amsterdam  Noord-Holland   
7816   2020-04-03 10:00:00            GM0363         Amsterdam  Noord-Holland   
8183   2020-04-04 10:00:00            GM0363         Amsterdam  Noord-Holland   
8550   2020-04-05 10:00:00            GM0363         Amsterdam  Noord-Holland   
8917   2020-04-06 10:00:00            GM0363         Amsterdam  Noord-Holland   
9284   2020-04-07 10:00:00            GM0363         Amsterdam  Noord-Holland   
9651   2020-04-08 10:00:00            GM0363         Amsterdam  Noord-Holland   
10018  2020-04-09 10:00:00            GM0363         Amsterdam  Noord-Holland   
10385  2020-04-10 10:00:00            GM0363         Amsterdam  Noord-Holland   
10752  2020-04-11 10:00:00            GM0363         Amsterdam  Noord-Holland   
11119  2020-04-12 10:00:00            GM0363         Amsterdam  Noord-Holland   
11486  2020-04-13 10:00:00            GM0363         Amsterdam  Noord-Holland   
11853  2020-04-14 10:00:00            GM0363         Amsterdam  Noord-Holland   

       Total_reported  Hospital_admission  Deceased  
6715              626                 243        20  
7082              674                 245        22  
7449              730                 262        29  
7816              771                 286        29  
8183              818                 306        46  
8550              887                 323        51  
8917              916                 330        56  
9284              940                 339        59  
9651              951                 338        62  
10018            1004                 343        66  
10385            1087                 345        68  
10752            1137                 353        75  
11119            1182                 355        82  
11486            1212                 368        87  
11853            1258                 400        94  

Casus: 1258-674=584
Hosp: 400-243=157
Deceased: 94-20=74

I was trying to find the data the RIVM uses to estimate the R(t), and I finally figured it out.

"https://data.rivm.nl/covid-19/COVID-19_aantallen_gemeente_cumulatief.csv" is counting the number of people hospitalized per the day they were admitted to the hospital. Counts before march 14 are missing.

Contrary to that "https://data.rivm.nl/covid-19/COVID-19_casus_landelijk.csv" has the first day the illness was recognized by the GGD. So if you accumulate that for hospitalized people you get the red line in the figure 16 on page 10 of the 28-07-2020. The blue histogram in the background of that figure should match (according to the description) the totals of the previous table , but the match is not perfect. It also does not have any missing values. So it strongly they are using a file here that is not published.

0016