Accordance with map/data on the RIVM website
Tablel352 opened this issue · 2 comments
Should this data - https://raw.githubusercontent.com/J535D165/CoronaWatchNL/master/data-geo/data-municipal/RIVM_NL_municipal.csv - be in accordance with this map/data on the RIVM website (https://www.rivm.nl/coronavirus-covid-19/actueel#!node-coronavirus-covid-19-meldingen? Or is it known that it is not and the differences can be explained?
If I take for example Amsterdam (Gemeentecode 363) in the period 1 april 2020 untill 14 april 2020 then:
RIVM map says: 183 ziekenhuisopnames, 84 overleden
The CSV data from CoronawatchNL says: 157 ziekenhuisopnames, 74 overleden
Dear @Tablel352, sorry for the late response.
Unfortunately, the data at the website doesn't match the data of their own API. https://data.rivm.nl/covid-19/COVID-19_aantallen_gemeente_cumulatief.csv. I'm clueless on the reason. I haven't seen any reason to doubt the API so far. The numbers are quite similar with the daily reported numbers (back in the days).
This is a prove that our data matches the API:
import pandas as pd
url = "https://data.rivm.nl/covid-19/COVID-19_aantallen_gemeente_cumulatief.csv"
df = pd.read_csv(url, sep=";")
df[df["Municipality_name"] == "Amsterdam"].iloc[18:35]
Date_of_report Municipality_code Municipality_name Province \
6715 2020-03-31 10:00:00 GM0363 Amsterdam Noord-Holland
7082 2020-04-01 10:00:00 GM0363 Amsterdam Noord-Holland
7449 2020-04-02 10:00:00 GM0363 Amsterdam Noord-Holland
7816 2020-04-03 10:00:00 GM0363 Amsterdam Noord-Holland
8183 2020-04-04 10:00:00 GM0363 Amsterdam Noord-Holland
8550 2020-04-05 10:00:00 GM0363 Amsterdam Noord-Holland
8917 2020-04-06 10:00:00 GM0363 Amsterdam Noord-Holland
9284 2020-04-07 10:00:00 GM0363 Amsterdam Noord-Holland
9651 2020-04-08 10:00:00 GM0363 Amsterdam Noord-Holland
10018 2020-04-09 10:00:00 GM0363 Amsterdam Noord-Holland
10385 2020-04-10 10:00:00 GM0363 Amsterdam Noord-Holland
10752 2020-04-11 10:00:00 GM0363 Amsterdam Noord-Holland
11119 2020-04-12 10:00:00 GM0363 Amsterdam Noord-Holland
11486 2020-04-13 10:00:00 GM0363 Amsterdam Noord-Holland
11853 2020-04-14 10:00:00 GM0363 Amsterdam Noord-Holland
Total_reported Hospital_admission Deceased
6715 626 243 20
7082 674 245 22
7449 730 262 29
7816 771 286 29
8183 818 306 46
8550 887 323 51
8917 916 330 56
9284 940 339 59
9651 951 338 62
10018 1004 343 66
10385 1087 345 68
10752 1137 353 75
11119 1182 355 82
11486 1212 368 87
11853 1258 400 94
Casus: 1258-674=584
Hosp: 400-243=157
Deceased: 94-20=74
I was trying to find the data the RIVM uses to estimate the R(t), and I finally figured it out.
"https://data.rivm.nl/covid-19/COVID-19_aantallen_gemeente_cumulatief.csv" is counting the number of people hospitalized per the day they were admitted to the hospital. Counts before march 14 are missing.
Contrary to that "https://data.rivm.nl/covid-19/COVID-19_casus_landelijk.csv" has the first day the illness was recognized by the GGD. So if you accumulate that for hospitalized people you get the red line in the figure 16 on page 10 of the 28-07-2020. The blue histogram in the background of that figure should match (according to the description) the totals of the previous table , but the match is not perfect. It also does not have any missing values. So it strongly they are using a file here that is not published.