HDFGroup/hdf5-json

Create hdf5-json validator.

hyoklee opened this issue · 14 comments

I hope someone can write hdf5-json validation tool in Python. It should also check that valid uuids are used for dim scales within a JSON file.

This is an ill-defined problem and needs to be translated into a set of well-posed problems.

The first problem is a checker that can determine if JSON contains the correct keywords according to grammar rule.

http://hdf5-json.readthedocs.org/en/latest/bnf/index.html

This is a well-posed problem.

Here is a json schema validator: https://github.com/json-schema/json-schema.
Can we utilize this?

Joe, what is the use case for the json validator?

As a client developer, I've created and sent so many invalid JSON payloads to HPD server / h5serv. I'd like to validate it first before I send. HPD server may benefit too to prevent garbage-in and garbage-out problem.

Users can use such tool before they import h5json files that user edited manually in bulk (e.g., replace all attribute values from " to ') into HPD Server.

FYI, I am working on this functionality in the PD server code. Not as general as it could be but should be sufficient for the PD server. I can push this to Heroku but in that branch the OPeNDAP/THREDDS links are already moved to port 80.

Great, Aleksandar! I hope you guys can pull codes together to create a general validation tool that code generator can benefit from it as well.

The second problem is checking values against its type. Throw an error if a user specified

"type": {
                        "base": "H5T_STD_I8LE", 
                        "class": "H5T_INTEGER"
                    }, 
                    "value": [34234343449,... 

Did you try that with the PD server?

Of course, I did and I'm very happy that server rejects it.

Aleksandar,
Can you point me at the code in PD server? I'll see if I can extract into a stand-alone script (or maybe as an command line option for jsontoh5.py - don't create an HDF5 file, just validate the json).

@jreadey This is the entry point. All the other helper functions are in the same file.

I think the quickest way to use this code is with a front-end "loader" that would call it for each group/dataset/attribute.

An HD5/JSON validator based on the JSON Schema is now part of the h5json package. It does only HDF5/JSON grammar validation, not any semantic or logical relationships like dimension scales.

I think this issue should be closed but will leave it open for a month to gather comments from others. So, I'll close this issue if no further discussion by end of March 2022. (It's been open for so long, what's one more month. 😄)

Closing due to no further comments.