/aptos

:sunny: A tool for validating data using JSON Schema and converting JSON Schema documents into different data-interchange formats

Primary LanguagePythonApache License 2.0Apache-2.0

Build Status Coverage Status PyPI Gitter

Validate client-submitted data using JSON Schema documents and convert JSON Schema documents into different data-interchange formats.

Contents

Why aptos?

  • Validate client-submitted data
  • Convert JSON Schema documents into different data-interchange formats
  • Simple syntax
  • CLI support for data validation and JSON Schema conversion
  • Stop Being a "Janitorial" Data Scientist

Installation

via pip

$ pip install aptos

via git

$ git clone https://github.com/pennsignals/aptos.git && cd aptos
$ python setup.py install

Usage

aptos supports the following capabilities:

  • Data Validation: Validate client-submitted data using validation keywords described in the JSON Schema specification.
  • Schema Conversion: Convert JSON Schema documents into different data-interchange formats. See the list of supported data-interchange formats for more information.
usage: aptos [arguments] SCHEMA

aptos is a tool for validating client-submitted data using the JSON Schema
vocabulary and converts JSON Schema documents into different data-interchange
formats.

positional arguments:
  schema              JSON document containing the description

optional arguments:
  -h, --help          show this help message and exit

Arguments:
  {validate,convert}
    validate          Validate a JSON instance
    convert           Convert a JSON Schema into a different data-interchange
                      format

More information on JSON Schema: http://json-schema.org/

Data Validation

Here is a basic example of a JSON Schema:

{
    "title": "Person",
    "type": "object",
    "properties": {
        "firstName": {
            "type": "string"
        },
        "lastName": {
            "type": "string"
        },
        "age": {
            "description": "Age in years",
            "type": "integer",
            "minimum": 0
        }
    },
    "required": ["firstName", "lastName"]
}

Given a JSON Schema, aptos can validate client-submitted data to ensure that it satisfies a certain number of criteria.

JSON Schema Validation keywords such as minimum and required can be used to impose requirements for successful validation of an instance. In the JSON Schema above, both the firstName and lastName properties are required, and the age property MUST have a value greater than or equal to 0.

Valid Instance ✔️ Invalid Instance ✖️
{"firstName": "John", "lastName": "Doe", "age": 42} {"firstName": "John", "age": -15} (missing required property lastName and age is not greater than or equal to 0)

aptos can validate client-submitted data using either the CLI or the API:

Data Validation CLI

$ aptos validate -instance INSTANCE SCHEMA

Arguments:

  • INSTANCE: JSON document being validated
  • SCHEMA: JSON document containing the description

Example - macOS:

$ aptos validate -instance '{"firstName": "John"}' person.json

Example - Windows:

> aptos validate -instance "{\"firstName\": \"John\"}" person.json
Successful Validation ✔️ Unsuccessful Validation ✖️

Data Validation API

import json

from aptos.parser import SchemaParser
from aptos.visitor import ValidationVisitor


with open('/path/to/schema') as fp:
    schema = json.load(fp)
component = SchemaParser.parse(schema)
# Invalid client-submitted data (instance)
instance = {
  'firstName': 'John'
}
try:
    component.accept(ValidationVisitor(instance))
except AssertionError as e:
    print(e)  # instance {'firstName': 'John'} is missing required property 'lastName'

Structured Message Generation

Given a JSON Schema, aptos can generate different structured messages.

⚠️ Note: The JSON Schema being converted MUST be a valid JSON Object.

Supported Data-Interchange Formats

Format Supported Notes
Apache Avro ✔️
Protocol Buffers ✖️ Planned for future releases
Apache Thrift ✖️ Planned for future releases
Apache Parquet ✖️ Planned for future releases

Avro

Using the Person schema in the previous example, aptos can convert the schema into the Avro data-interchange format using either the CLI or the API.

aptos maps the following JSON schema types to Avro types:

JSON Schema Type Avro Type
string string
boolean boolean
null null
integer long
number double
object record
array array

JSON Schema documents containing the enum validation keyword are mapped to Avro enum symbols attribute.

JSON Schema documents with the type keyword as an array are mapped to Avro Union types.

Data-Interchange CLI

$ aptos convert -format FORMAT SCHEMA

Arguments:

  • FORMAT: Data-interchange format
  • SCHEMA: JSON document containing the description

Data-Interchange API

import json

from aptos.parser import SchemaParser
from aptos.schema.visitor import AvroSchemaVisitor


with open('/path/to/schema') as fp:
    schema = json.load(fp)
component = SchemaParser.parse(schema)
record = component.accept(AvroSchemaVisitor())
print(json.dumps(record, indent=2))

The above code generates the following Avro schema:

{
  "type": "record",
  "fields": [
    {
      "doc": "",
      "type": "string",
      "name": "lastName"
    },
    {
      "doc": "",
      "type": "string",
      "name": "firstName"
    },
    {
      "doc": "Age in years",
      "type": "long",
      "name": "age"
    }
  ],
  "name": "Person"
}

Testing

All unit tests exist in the tests directory.

To run tests, execute the following command:

$ python setup.py test

Additional Resources

Future Considerations

Maintainers

Jason Walsh
Jason Walsh

Contributing

Contributions welcome! Please read the contributing.json file first.

Join our Slack channel!

License

Apache 2.0 © Penn Signals