Use ruamel.yaml to improve YAML line determination for JSONSchema errors
Closed this issue · 2 comments
One of the most fragile parts of our current code is how we determine the line number of a problem found in the YAML. The JSON Schema validator takes a Python data structure as the input, and ValidationError
objects contain path information pointing to the problematic part of the data structure, but when we use PyYAML to read the YAML the line numbers are lost. So we've got a system using regular expressions to try and follow the path to the relevant line of the YAML.
ruamel.yaml is a YAML parsing library designed to support round-trip conversion of YAML files through Python, including formatting and comments. This means that when it loads YAML, the data structure it returns still contains line and column numbers. So given the path to the problem, we can find that part of the data structure and then look up the line and column numbers. Here's an example:
import json
import jsonschema
import sys
from ruamel.yaml import YAML
yaml=YAML()
with open("obi.md") as f:
data = yaml.load(f)
with open("registry_schema.json") as f:
registry_schema = json.load(f)
try:
jsonschema.validate(data, registry_schema)
print("SUCCESS")
except jsonschema.exceptions.ValidationError as err:
print(err.message)
keys = list(err.path)
if err.validator == "additionalProperties":
m = err.message
key = m[m.index("'")+1:m.rindex("'")] # get the added key name; this is hacky
keys.append(key)
if len(keys) > 1:
subset = data
while len(keys) > 1:
subset = subset[keys.pop(0)] # follow the path, stopping with one key left
pos = subset.lc.data[keys[0]] # get ruamel.yaml's line-column information
print(f"at line {pos[0] + 1}, column {pos[1] + 1}")
sys.exit(1)
I think this approach will be much more robust, and I hope that it won't be to hard to replace our current code.
I didn't find the official docs all that helpful for what I wanted to do:
Instead I relied on the source code:
https://sourceforge.net/p/ruamel-yaml/code/ci/default/tree/comments.py
Much better. It also handles the duplicate key validation. :-D