stac-utils/stac-validator

Confusing validation error for Item

philvarner opened this issue · 13 comments

Running against the attached JSON file, I get the confusing error message below.

$ stac-validator item.json
[
    {
        "version": "1.0.0",
        "path": "item.json",
        "schema": [
            "https://schemas.stacspec.org/v1.0.0/item-spec/json-schema/item.json"
        ],
        "valid_stac": false,
        "error_type": "ValidationError",
        "error_message": "'collection1' should not be valid under {}. Error is in collection"
    }
]

item.txt

Does any of this make sense? From the schema: https://schemas.stacspec.org/v1.0.0/item-spec/json-schema/item.json

            "properties": {
              "links": {
                "contains": {
                  "required": [
                    "rel"
                  ],
                  "properties": {
                    "rel": {
                      "const": "collection"
                    }
                  }
                }
              }
            }
          },
          "then": {
            "required": [
              "collection"
            ],
            "properties": {
              "collection": {
                "title": "Collection ID",
                "description": "The ID of the STAC Collection this Item references to.",
                "type": "string",
                "minLength": 1
              }
            }
          },
          "else": {
            "properties": {
              "collection": {
                "not": {}
              }
            }
          }

If you add something like this to your links it will pass - I guess just a collection link

      {
        "rel": "collection",
        "href": "./collection.json",
        "type": "application/json",
        "title": "Simple Example Collection"
      }

or you can just move the collection field to properties instead I think

There definitely could be better error messaging - we are just catching the error messages from jsonschema right now ...

Can we add messages to the schema itself?

@philvarner @gadomski would something like this help? Notice the help message at the end. We would need a lot of if statements to better explain these types of errors.

    {
        "version": "1.0.0",
        "path": "phil.json",
        "schema": [
            "https://schemas.stacspec.org/v1.0.0/item-spec/json-schema/item.json"
        ],
        "valid_stac": false,
        "error_type": "ValidationError",
        "error_message": "'collection1' should not be valid under {}. Error is in collection",
        "help": "If the error message doesn't make sense, refer to the schema"
    }

To me,

Error is in collection

could be made more clear. I tend to find JsonSchema errors hard to understand, but to your point @jonhealy1 correctly re-wording all possible cases is probably out of scope. One alternative would be to fall back on JsonSchema's own string representation instead, e.g. change

if e.absolute_path:
err_msg = f"{e.message}. Error is in {' -> '.join([str(i) for i in e.absolute_path])}"
else:
err_msg = f"{e.message} of the root of the STAC object"
to something like:

            err_msg = str(e)

In this case, the validator output would look like:

[
    {
        "version": "1.0.0",
        "path": "item.txt",
        "schema": [
            "https://schemas.stacspec.org/v1.0.0/item-spec/json-schema/item.json"
        ],
        "valid_stac": false,
        "error_type": "ValidationError",
        "error_message": "'collection1' should not be valid under {}\n\nFailed validating 'not' in schema['allOf'][0]['allOf'][2]['else']['properties']['collection']:\n    {'not': {}}\n\nOn instance['collection']:\n    'collection1'"
    }
]

Not sure if that's better?

Haha it's not really any better and it's a little messy. With this type of error you really need to try to read the schema. We could try and catch this one circumstance. It is a little confusing what the rules are with where to put 'collection'.

It's something that should maybe be added to stac-check

Yeah, IMO a custom validator like this (or stac-check) does have a space to catch common problems that are hard to understand from jsonschema; this case (missing the collection link and/or collection attribute) is pretty common and really hard to understand from jsonschema errors, so might warrant a special check.

Printing the json schema validation error is a huge help, because then at least I know where to go look in the schema for the problem, even if that message is still unclear. I don't think there's any general way to map these errors, but... it may be useful to handle this case -- I'm pretty knowledgable about STAC, and I didn't realize the schema actually required that there be a collection link if you define the collection field. (Though I kind of disagree with that constraint -- why shouldn't I be allowed to set the collection field value without a link to a collection?) At least the JSON error points you to where to look, since the current error message is useless.

I didn't know it was an issue either. If you've never worked with a json schema before, the second message isn't very helpful either. I think a help message advising someone to look at the schema is helpful.

I do think it's better to print out the full message like Pete did.