frictionlessdata/frictionless-py

Schema validator

pwalsh opened this issue · 6 comments

The SchemaValidator checks that data conforms to a JSON Table Schema.

  • Implement shared validator API
  • Create a better reference spec for JTS itself (see: https://github.com/dataprotocols/schemas)
  • Implement standalone run method
  • Check headers are valid according to schema
  • Check data is valid according to schema
    • This can be very deep, so discussed with @rgrp to minimally start with date and number validation, and build out from there in iterations. Will create separate issues when this issue closes.
  • Write tests as stand alone (via self.run)
  • Write tests as part of pipeline (via PipelineValidator.run)

Rabbit hole

Stuff that is beyond scope of this first pass, but that defines the larger scope of where we'd like to get.

  • Generate schema from the data, if we do not have a schema #15
  • foreignKeys: #17
  • constraints.minLength, constraints.maxLength, constraints.minimum, constraints.maximum needs discussion frictionlessdata/specs#161
  • Some issues around type and format. Would like to see this resolved before implementing deeper support of spec frictionlessdata/specs#159

This implementation supports validation of the following types:

  • string
  • integer
  • number
  • object
  • array
  • date, time, datetime
  • boolean
  • any :)

It doesn't really deal with formats, except in the case of the date/time types.

Other stuff that is not in our scope now is all recorded under the "Rabbit hole" heading of the main issue description above (which links out to specific issues).

Related, I've updated my type casting to have an almost identical API to messytables, but I am still holding off on depending on messytables directly as I'd like to keep py3 support (see okfn/messytables#117)