Data representation changes by RIVM June 16, 2020
J535D165 opened this issue · 10 comments
Hi all,
There are plenty of changes to the data representations by RIVM today. One of our primary sources of data has been removed (the map on the front-page). We are working on a solution for this. Because most of our pipelines are broken, we do need some time to rebuild this. The good news is, we seem to have access to all data we had before, so we can reconstruct the pipelines.
Best, on behalf of the CoronaWatchNL team, Jonathan
Also the numbers don't add up to the same values as the values they put in the site anymore.
You can copy the csvdata from the source of the webpage and paste it in Excel.
If you sum al the columns you get different numbers than the ones mentioned in the summery at the top of the page.
The data that is available at this link is also not equal to the data from the map:
https://data.rivm.nl/covid-19/COVID-19_aantallen_gemeente_cumulatief.csv
I assume that the difference between the map and the CSV is equal to the numbers per province where the municipality is unknown (in the CSV)? The total numbers for NL on the RIVM page match with the daily sum of the numbers in the CSV file.
If I compare the numbers for all the municipalities in the provence of Utrecht with the numbers shown in the powerbi dashboard (VR Ijselland) or the data from data.rivm.nl then there is a difference in the number of hospitalized. The numbers don't match anymore since the RIVM introduced the timeslider in the map.
The number of hospitalised persons for the province of Utrecht today (2020-06-17) should be 880. For 5 of these the municipality is unknown.
Yes that is true but if you use the data that is used to create the maps (csvdata part of the sourcecode) the total is 843. Monday the data was correct but after introducing the timeslider the data is incomplete.
Ok, so you have download 8 data files for the 8 two-week-periods and summed the numbers for the municipalities in Utrecht (Amersfoort, Baarn, Bunnik, Bunschoten, De Bilt, De Ronde Venen, Eemnes,
Houten, IJsselstein, Leusden, Lopik, Montfoort, Nieuwegein, Oudewater, Renswoude, Rhenen, Soest,
Stichtse Vecht, Utrecht, Utrechtse Heuvelrug, Veenendaal, Vijfheerenlanden, Wijk bij Duurstede, Woerden, Woudenberg, Zeist) and that adds up to 843?
Well noticed! Indeed, I took 1 sample: Amersfoort has 18 hospitalised persons at the map with the slider and in the related download "In het ziekenhuis opgenomen COVID-19 patiënten - Per gemeente van 11-mrt-2020 t_m 24-mrt-2020". But there are 12 persons hospitalised in Amersfoort at 24-mrt-2020 according to https://data.rivm.nl/covid-19/COVID-19_aantallen_gemeente_cumulatief.csv ... (so the incremental value is 6 higher than the cumulative value...)
Hi @J535D165 and others, no worries, day by day archive of CoronaWatchNL repo is available here in Dataverse, nothing got lost:
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/0PD4JM
What strikes me surprising here is that up until the change daily differences and totals were all the time matching (except maybe once in early March). Since the change every day daily difference is greater than difference of totals.
I am not exactly sure what conclusion to draw from this:
A. Data unstable since new changes in the format (struggling with causality here)
B. Data was already unstable throught previous months, but this was somehow consealed
I realize we don't really have other options, but is RIVM data reliable? (as in "collected properly", not as in "accurate")