JSON schema and validation code for HEPData submissions
- Documentation: http://hepdata-validator.readthedocs.io
If you can, install LibYAML (a C library for parsing and emitting YAML) on your machine. This will allow for the use of CLoader for faster loading of YAML files. Not a big deal for small files, but performs markedly better on larger documents.
Via pip:
pip install hepdata-validator
Via GitHub (for developers):
git clone https://github.com/HEPData/hepdata-validator
cd hepdata-validator
pip install --upgrade -e .[tests]
pytest testsuite
To validate submission files, instantiate a SubmissionFileValidator
object:
from hepdata_validator.submission_file_validator import SubmissionFileValidator
submission_file_validator = SubmissionFileValidator()
submission_file_path = 'submission.yaml'
# the validate method takes a string representing the file path
is_valid_submission_file = submission_file_validator.validate(file_path=submission_file_path)
# if there are any error messages, they are retrievable through this call
submission_file_validator.get_messages()
# the error messages can be printed
submission_file_validator.print_errors(submission_file_path)
To validate data files, instantiate a DataFileValidator
object:
from hepdata_validator.data_file_validator import DataFileValidator
data_file_validator = DataFileValidator()
# the validate method takes a string representing the file path
data_file_validator.validate(file_path='data.yaml')
# if there are any error messages, they are retrievable through this call
data_file_validator.get_messages()
# the error messages can be printed
data_file_validator.print_errors('data.yaml')
Optionally, if you have already loaded the YAML object, then you can pass it through
as a data
object. You must also pass through the file_path
since this is used as a key
for the error message lookup map.
from hepdata_validator.data_file_validator import DataFileValidator
import yaml
file_contents = yaml.safe_load(open('data.yaml', 'r'))
data_file_validator = DataFileValidator()
data_file_validator.validate(file_path='data.yaml', data=file_contents)
data_file_validator.get_messages('data.yaml')
data_file_validator.print_errors('data.yaml')
An example offline validation script
uses the hepdata_validator
package to validate the submission.yaml
file and all YAML data files of a
HEPData submission.
When considering native HEPData JSON schemas, there are multiple versions.
In most cases you should use the latest version (the default). If you need to use a different version,
you can pass a keyword argument schema_version
when initialising the validator:
submission_file_validator = SubmissionFileValidator(schema_version='0.1.0')
data_file_validator = DataFileValidator(schema_version='0.1.0')
When using remotely defined schemas, versions depend on the organization providing those schemas, and it is their responsibility to offer a way of keeping track of different schema versions.
The JsonSchemaResolver
object resolves $ref
in the JSON schema. The HTTPSchemaDownloader
object retrieves
schemas from a remote location, and optionally saves them in the local file system, following the structure:
schemas_remote/<org>/<project>/<version>/<schema_name>
. An example may be:
from hepdata_validator.data_file_validator import DataFileValidator
data_validator = DataFileValidator()
# Split remote schema path and schema name
schema_path = 'https://scikit-hep.org/pyhf/schemas/1.0.0/'
schema_name = 'workspace.json'
# Create JsonSchemaResolver object to resolve $ref in JSON schema
from hepdata_validator.schema_resolver import JsonSchemaResolver
pyhf_resolver = JsonSchemaResolver(schema_path)
# Create HTTPSchemaDownloader object to validate against remote schema
from hepdata_validator.schema_downloader import HTTPSchemaDownloader
pyhf_downloader = HTTPSchemaDownloader(pyhf_resolver, schema_path)
# Retrieve and save the remote schema in the local path
pyhf_type = pyhf_downloader.get_schema_type(schema_name)
pyhf_spec = pyhf_downloader.get_schema_spec(schema_name)
pyhf_downloader.save_locally(schema_name, pyhf_spec)
# Load the custom schema as a custom type
import os
pyhf_path = os.path.join(pyhf_downloader.schemas_path, schema_name)
data_validator.load_custom_schema(pyhf_type, pyhf_path)
# Validate a specific schema instance
data_validator.validate(file_path='pyhf_workspace.json', file_type=pyhf_type)
The native HEPData JSON schema are provided as part of the hepdata-validator
package and it is not necessary to
download them. However, in principle, for testing purposes, note that the same mechanism above could be used with:
schema_path = 'https://hepdata.net/submission/schemas/1.0.1/'
schema_name = 'data_schema.json'
and passing a HEPData YAML data file as the file_path
argument of the validate
method.