/hepdata-validator

JSON schema and validation code for HEPData submissions

Primary LanguagePythonGNU General Public License v2.0GPL-2.0

HEPData Validator

Travis Status Coveralls Status License GitHub Releases PyPI Version GitHub Issues Documentation Status

JSON schema and validation code for HEPData submissions

Installation

If you can, install LibYAML (a C library for parsing and emitting YAML) on your machine. This will allow for the use of CLoader for faster loading of YAML files. Not a big deal for small files, but performs markedly better on larger documents.

Via pip:

pip install hepdata-validator

Via GitHub (for developers):

git clone https://github.com/HEPData/hepdata-validator
cd hepdata-validator
pip install --upgrade -e .[tests]
pytest testsuite

Usage

To validate submission files, instantiate a SubmissionFileValidator object:

from hepdata_validator.submission_file_validator import SubmissionFileValidator

submission_file_validator = SubmissionFileValidator()
submission_file_path = 'submission.yaml'

# the validate method takes a string representing the file path
is_valid_submission_file = submission_file_validator.validate(file_path=submission_file_path)

# if there are any error messages, they are retrievable through this call
submission_file_validator.get_messages()

# the error messages can be printed
submission_file_validator.print_errors(submission_file_path)

To validate data files, instantiate a DataFileValidator object:

from hepdata_validator.data_file_validator import DataFileValidator

data_file_validator = DataFileValidator()

# the validate method takes a string representing the file path
data_file_validator.validate(file_path='data.yaml')

# if there are any error messages, they are retrievable through this call
data_file_validator.get_messages()

# the error messages can be printed
data_file_validator.print_errors('data.yaml')

Optionally, if you have already loaded the YAML object, then you can pass it through as a data object. You must also pass through the file_path since this is used as a key for the error message lookup map.

from hepdata_validator.data_file_validator import DataFileValidator
import yaml

file_contents = yaml.safe_load(open('data.yaml', 'r'))
data_file_validator = DataFileValidator()

data_file_validator.validate(file_path='data.yaml', data=file_contents)

data_file_validator.get_messages('data.yaml')

data_file_validator.print_errors('data.yaml')

An example offline validation script uses the hepdata_validator package to validate the submission.yaml file and all YAML data files of a HEPData submission.

Schema Versions

When considering native HEPData JSON schemas, there are multiple versions. In most cases you should use the latest version (the default). If you need to use a different version, you can pass a keyword argument schema_version when initialising the validator:

submission_file_validator = SubmissionFileValidator(schema_version='0.1.0')
data_file_validator = DataFileValidator(schema_version='0.1.0')

Remote Schemas

When using remotely defined schemas, versions depend on the organization providing those schemas, and it is their responsibility to offer a way of keeping track of different schema versions.

The JsonSchemaResolver object resolves $ref in the JSON schema. The HTTPSchemaDownloader object retrieves schemas from a remote location, and optionally saves them in the local file system, following the structure: schemas_remote/<org>/<project>/<version>/<schema_name>. An example may be:

from hepdata_validator.data_file_validator import DataFileValidator
data_validator = DataFileValidator()

# Split remote schema path and schema name
schema_path = 'https://scikit-hep.org/pyhf/schemas/1.0.0/'
schema_name = 'workspace.json'

# Create JsonSchemaResolver object to resolve $ref in JSON schema
from hepdata_validator.schema_resolver import JsonSchemaResolver
pyhf_resolver = JsonSchemaResolver(schema_path)

# Create HTTPSchemaDownloader object to validate against remote schema
from hepdata_validator.schema_downloader import HTTPSchemaDownloader
pyhf_downloader = HTTPSchemaDownloader(pyhf_resolver, schema_path)

# Retrieve and save the remote schema in the local path
pyhf_type = pyhf_downloader.get_schema_type(schema_name)
pyhf_spec = pyhf_downloader.get_schema_spec(schema_name)
pyhf_downloader.save_locally(schema_name, pyhf_spec)

# Load the custom schema as a custom type
import os
pyhf_path = os.path.join(pyhf_downloader.schemas_path, schema_name)
data_validator.load_custom_schema(pyhf_type, pyhf_path)

# Validate a specific schema instance
data_validator.validate(file_path='pyhf_workspace.json', file_type=pyhf_type)

The native HEPData JSON schema are provided as part of the hepdata-validator package and it is not necessary to download them. However, in principle, for testing purposes, note that the same mechanism above could be used with:

schema_path = 'https://hepdata.net/submission/schemas/1.0.1/'
schema_name = 'data_schema.json'

and passing a HEPData YAML data file as the file_path argument of the validate method.