Using the emod2018 dataset, opentopodata.org's API returns invalid JSON
Closed this issue · 4 comments
Hi :-)
Whilst working on a geotagging program using opentopodata.org's API, I noticed some error when parsing the server's answer to a QJsonDocument
. It yielded QJsonParseError::IllegalNumber
. After having had a closer look, this happened when using the emod2018
dataset.
Example: Requesting https://api.opentopodata.org/v1/emod2018?locations=50,11
returns
{
"results": [
{
"elevation": NaN,
"location": {
"lat": 50.0,
"lng": 11.0
}
}
],
"status": "OK"
}
The elevation
key's value is NaN
(instead of null
, as the other datasets return when a coordinate is out of bounds). This is, per RFC 4627, not permitted (cf. https://tools.ietf.org/html/rfc4627#section-2.4):
Numeric values that cannot be represented as sequences of digits
(such as Infinity and NaN) are not permitted.
Most probably, this is what QJsonDocument::fromJson
complains about.
Anyway, thanks about this great piece of software and the great service you provide :-)
Thanks for raising this! Python supports NaN
in json, I didn't realise it isn't in the spec!
I would like to remove NaN
s from the api if it's breaking clients
The main issue is that I currently use NaN
and null
differently in Open Topo Data: NaN
means there is a null/NODATA value in the underlying dataset at that location, null
means the dataset doesn't cover that location.
There's a few options I can think of
- Just set
NaN
->null
and lose the distinction of NODATA vs out of bounds. - Return a string
"NaN"
instead, though I don't love the mixed types. - Add another property like
"is_nodata": true
to the result. - Return a
400
error if location is out of bounds. - Add a query parameter
nodata_value
, which defaults toNaN
, but can be set tonull
or an integer like-9999
. Preserves compatibility, but allows standards-compliant clients to override. Change default tonull
in v2.
I'll give it some thought and see what clients other APIs are doing around NaN
and NODATA. Let me know if you have any ideas about a fix.
Hey, thanks for your quick reply :-)
I personally would be totally fine with returning null
if the requested location can't be determinded from the underlying dataset. I don't think one has really to differentiate if there's a "no data" value in the dataset or the dataset doesn't cover the location, as in both cases, one won't get the elevation. Imo, "no data" and "out of bounds" are essentially the very same ;-) Maybe, you want to add some "exists in dataset" key, but I see no real benefit here.
I first thought about checking the server's response for NaN
and replacing it with null
before parsing it, but after having checked the RFC, I thought "Well, Qt behaves exactly correct here, it's a server bug". Python doesn't follow the standard here … so imo, no matter what you do,NaN
should definitely not be returned. Strictly following a standard is never a bad idea, esp. in this case, where – in the present case – the server is Python, but the client is C++/Qt or whatever.
So if you ask me how to address this:
- Way to go – fixes the non-standard behavior without a real drawback :-)
- One would have to check if
elevation
is a string or a number. this sucks. I would leave it always to be a number. - If you like to do so, maybe. But as said, imo no data is no data – for whatever reason.
- I wouldn't do this. The request is perfectly okay, and so should be the answer. It only doesn't yield a result due to the underlying dataset lacking the location. But nothing went wrong!
- This would also require clients to change their code. Then, speaking of my use-case with C++/Qt (btw. I'm speaking of KGeoTag), I can as well check for
NaN
to be present in the server's answer, cache the result, replace it withnull
and parse the JSON afterwards. So, from my point of view, no benefit, but persisting non-standard behavior.
I hope I could help you a bit.
Thanks for your patience and thoughtful feedback! This should now be fixed on master and on api.opentopodata.com by 3620454: null
is now returned for NODATA values.
Having invalid json that breaks clients by default is a bug so I agree that null
should be the default.
But there is a semantic difference between NODATA and out of bounds, and other tools like rasterio and gdal have different behaviour for NODATA and querying outside bounds. So anyone relying on the old behaviour can set nodata_value=nan
. It's a query parameter rather than a config option as NaN support in json is client-dependent.
Going forward, if people want to handle NODATA differently they should instead use a numerical value for the nodata_value
parameter, which is a hacky but standard approach in e.g., gdal that preserves json validity.
Nice! I think this is a solution everybody can live with :-)
You could e. g. set nodata_value=-12000
or nodata_value=8900
, which is less than the depth of the Marianna Trench or more than the height of Mount Everest respectively and thus can't be a real value.