Provide Geocoded data
justb4 opened this issue · 5 comments
Compliments with this project!
For many applications that produce maps and/or apply spatial analysis, geocoded COVID-19 data (data with coordinate attributes/columns) would be very helpful. This sounds more complex than it actually is:
- many of the produced data CSVs here, have columns like Municipality (Gemeente)/Province code and or name
- the Dutch government via Kadaster-PDOK provides Open datasets for Administrative Borders (Bestuurlijke Grenzen) with those same names/codes.
- GeoJSON is an ideal format for supplying geospatial data
- there is ample Open Source software to convert/simplify these (GML) datasets. With little search found e.g. this script
- "geocoding" is mainly a matter of JOIN-ing on Municipality/Province names/codes (i.s.o. using Geocoding backends like Nominatim ), possibly GeoPandas can be of help. e.g. https://geopandas.org/mergingdata.html
I hope this triggers interest. Eventually I could foresee some extra/derived GeoJSON files generated from the CSVs under /data/
like under /data/geocoded
. Is all data here generated via GitHub Workflows/Actions? Then contributors could also add the required geocoding steps there.
Next step are OpenAPI endpoints from this GeoJSON data. This could be served directly from GitHub. We have a project based on OGC OpenAPI REST standards: pygeoapi where we are working on providing an Open Endpoint for COVID-19 data: https://demo.pygeoapi.io/covid-19/collections?f=html. Some Collections there already serve directly from GitHub repos, like Italy. For NL we use/proxy ESRI Endpoints, but would rather serve directly from a/this GH repo.
In theory I could do the work on this issue, but already quite occupied with the pygeoapi
part...
RIVM is now providing a part of its data as geocoded data https://data.rivm.nl/geonetwork/srv/dut/catalog.search#/metadata/1c0fcd57-1102-4620-9cfa-441e93ea5604
Yes, very much aware of this activity. Some from our Open Source geo-community are involved and the services are evolving.
Still some work (for RIVM) to do:
- it is indeed part of the data (aantallen_gemeente_cumulatief).
- the JSON files are not GeoJSON, i.e. spatially enabled.
- the geospatial data is a single layer (
aantallen_gemeente_cumulatief
) via WFS
But ok, that same table CSV-data, can now be downloaded via the WFS as GeoJSON (or GML KML, and even Shapefile, outputFormat=shape-zip
). 355 records each with municipality border, about 26MB, think because municipality borders ("gemeentegrenzen") have not been simplified (so too many border-points):
curl 'https://geodata.rivm.nl/geoserver/wfs?request=GetFeature&service=WFS&version=2.0.0&typeName=vw_nl_covid_19_aantallen_gemeente_cumulatief&outputFormat=json' > wfs.json
Still a good start, don't know how this affects the current work/dataflows here. Can all (RIVM) CSVs here be derived from that CSV file? NICE data off course still has different source.
(Now I see a new map data-geo and related workflow scripts. Though also the data is not yet spatially enabled. That would require either using the WFS download, but then we only have municipality-level (plus very large datafiles because of the non-simplified borderpoints) that needs to be aggregated to province and country-levels.
I need to study the (Python) scripts and see how to make the files geospatial as GeoJSON as I described in the issue text above. But the ultimate solution would be if RIVM would provide the data at the detail/expanded level as here...
To be honest, I'm disappointed about the geocoding stuff RIVM worked on this week. Time is tight for RIVM at the moment. We all know and accept I think/hope. The geocoding work doesn't add anything new. We can convert the data into GeoJSON ourselves, isn't it? Shapefiles are there, cumulative numbers are there... simply join the two files and we do have these geocoded files. Like many of us did for their websites, models, and scientific publications.
Thinking aloud, what does this add to handle the pandemic in The Netherlands? The latest cumulative numbers aren't very informative. Daily changes are. However, they still provide the latest values only (and overwrite the previous counts). We still don't have access to large parts of the time series (#44). Tables with very important information are discontinued or significantly changed (like test data, and underlying medical problems). Other tables do have clear mistakes, like the province data in the PDF files. Hopefully, RIVM will prioritize on the completeness and FAIRness of the data instead of working on things we can derive ourselves. Secondly, work on easy-to-use features.
Sidenote for those who want to make the join themselves: We do have simplified geoJSON files with the municipalities in this folder: https://github.com/J535D165/CoronaWatchNL/tree/master/ext. This is the same file as they use on their news page.
@J535D165 I share your sentiment. The current offering from RIVM is half-baked, hidden among many other datasets. They have put something out quickly. Hopefully work-in-progress. Also on other aspects of the current crisis there is a need to bring together experts that build transparent/Open Source solutions. Simplest would be if RIVM/government put together teams with people like you/your team and other data/geospatial experts to unlock detailed data via downloads and APIs/web services to help other experts in their analysis. The data itself is IMHO not complicated, but indeed needs to be supplied fully temporal (time-series) and, where applicable, geospatial...Me and other geospatial folks are in contact with them, but still on the outside, basic advise, not detailed/implementation level. There are also procedures/bureaucracy that hampers technical folks "inside" to move quickly with using external expertise.
Hi @J535D165 and @justb4, can we help with Geocoded data somehow?
We have more than 1k people in our community and public Cloud infrastructure for heavy computations. I'm pretty sure we can find some people interested in this task.