CoverageJSON output invalid
letmaik opened this issue · 14 comments
This refers to http://test.opendap.org/opendap/AIRS/QC_AIRS-1.2011.02.13.090.L2.RetStd.v5.2.2.0.G11045052855.hdf.covjson.
As far as I see, all the "parameters" listed in the result, e.g. AIRSXTrack
or StdPressureLay
are not parameters but should be domain axes. The actual parameters, looking at http://test.opendap.org/opendap/AIRS/QC_AIRS-1.2011.02.13.090.L2.RetStd.v5.2.2.0.G11045052855.hdf.html, like O3_Resid_Ratio
or TCldTopStd
are all missing.
Just looking at one of the ranges objects of the output makes this mismatch obvious:
"AIRSXTrack": {
"type": "NdArray",
"dataType": "integer",
"axisNames": ["y", "x"],
"shape": [3],
"values": [0, 1, 2]
},
Here the axes are given as y and x (which are defined correctly earlier in the "domain"
part), however the shape and values do not match that.
@letmaik yes the bug(s) are/is glaringly obvious here. It would appear however that the CovJSON document syntax for CoverageCollection
is fine as per the specification description.
As far as I understand it we need to determine the following
- When parameters should be included within the coverage collection object (note that inclusion here is optional, denoted by MAY within the specification), and
- What conditions exist for granules variables (encoded as CovJSON
ranges
objects to be essentially ignored altogether. - As you mention above @letmaik each of the ranges objects'
values
seem to be an incremental counter for the size of theshape
this is entirely incorrect.
There is some nice debugging logging we can use to explicitly print the shape constraints for the no3 issue above.
It's probably relevant for me to note @letmaik thanks for reviewing this work. I think if we tested the CovJSON responses against more complex granules such as this AIRS one previously the implementation would be hardened. This is very valuable feedback... AIRS is not a dataset I have been working with so it looks like it presents more complex challenges.
Actually, I think we should definitely use this granule as a unit test. It should be added to the test data and we should add a note to the README documentation regarding how we can add tests as it is not intuitive at this point in time.
The CoverageCollection
structure is fine, yes. Also, the domain of the only coverage in that collection seems ok, but then everything else doesn't make sense.
Can you provide some URLs of CovJSON test resources that are fine in your opinion? Just want to have a quick look to rule out (or find) systematic errors without digging into the code.
By the way, I only looked at this test dataset because that's the one you referred to for advertising the CovJSON support of Hyrax in the "Status?" issue of the specification repo of CovJSON.
At PO.DAAC we are not yet running 1.15 in production so I can’t provide you with a public URL.
What about the following... which is also incorrect! It does not include several variables from the granule...
test.opendap.org:8080/opendap/ioos/200803061600_HFRadar_USEGC_6km_rtv_SIO.ncml.covjson
I’ve also found the following error
test.opendap.org:8080/opendap/larc/MISR_AM1_CGAS_MAR_04_2006_SITE_INTEXB_F06_0021.hdf.covjson
Scope this Grid coverage out
test.opendap.org:8080/opendap/Pathfinder/Northwest_Atlantic/1km/raw/2001/4/f01091084047.hdf.covjson
Scope this Grid coverage out
test.opendap.org:8080/opendap/Pathfinder/Northwest_Atlantic/1km/raw/2001/4/f01091084047.hdf.covjson
Better, but also invalid:
"id"
in observed property must be a string, not null. It may be omitted however."symbol"
is a key within the parameter object but this should go inside the"unit"
object."type"
inside"symbol"
cannot be an empty string. If there is no known unit scheme, then the unit symbol must be given directly as string value of"symbol"
."value"
inside"symbol"
isTemp
. This sounds more like the label of the observed property (Temperate) than the unit."dataType": "unknown"
in the range is not defined in the spec.- The values in the range must always be a 1D array, not 2D.
Most of these things can be easily checked by copy-pasting the JSON (if it's not too big) into the CovJSON playground and seeing if it displays something useful.
The primary issue here is that we don’t have a formal test suite for CovJSON. We have discussed this recently and I think it’s somethhing we should work on producing.
All of the granules I’ve pasted above have been entirely random selections so it looks like, unfortunately, we have some work to do to ensure that the cases highlighted above at least are addressed.
@letmaik @jonblower would either of you be interested in assisting me write a test suite for CovJSON implementations this coming GSoC? We could easily facilitate it through the @ESIPFed at https://github.com/ESIPFed/gsoc/issues
Yes, I could help at some level. It would be great to have the test suite implemented as a kind of "linter", to help people check validity. We started work on this a while ago but didn't get very far, so doing this as GSoC sounds great.
OK cool @jonblower I'll log a GSoC project idea over on the ESIP GSoC project issues and tag you there. Thanks
In regards to the comment "As far as I see, all the "parameters" listed in the result, e.g. AIRSXTrack or StdPressureLay are not parameters but should be domain axes."
It is extremely difficult to know of all the different names of potential domain axes in these datasets largely because the naming conventions are so varied. Right now, the module searches for the following naming conventions:
- lon, LON, longitude, LONGITUDE, and COADSX
- lat, LAT, latitude, LATITUDE, and COADSY
- lev, LEV, height, HEIGHT, depth, DEPTH, pres, PRES
- time, TIME
I'll be the first to admit that this hardly covers everything and could be expanded. But as a person who does not necessarily work with these datasets on a regular basis, its difficult for me to catch everything without having some kind of reference or key. I will look into expanding the domain axes detection as soon as possible.
Yes - if the data follow the CF conventions, then there are definitive ways for detecting domain axes. But if not, I guess there's no choice but to make an educated guess