Read and validate a data package descriptor, including profile-specific validation via the registry of JSON Schemas
georgeslabreche opened this issue · 5 comments
Read and validate a data package descriptor, including profile-specific validation via the registry of JSON Schemas.
- Create package from JSONObject descriptor.
- Create package from JSON String descriptor.
- Create package from Remote File descriptor.
- Create package from Local File descriptor.
- Schema validation.
- Support strict validation flag (GitHub Issue #17).
- Profile-specific validation via the registry of JSON Schemas.
@roll I think the only think missing thing here is was "Support strict validation flag," which I implemented as as new issue I created, issue #17, and "Profile-specific validation via the registry of JSON Schemas."
For "Read and validate a data package descriptor," I've updated the README to document how this has been implemented.
I'm not exactly sure what is meant by "Profile-specific validation via the registry of JSON Schemas," even after reviewing the specs. Does it simply mean that the data under "resources" needs to conform to the given the profile? For instance, if I have profile:"tabular-data-resource"
then the files assigned to the data property need to be validated as CSV and then that CSV needs to be valid with respect to the provided schema (if a schema is provided in the first place)?
@georgeslabreche
Cool. Could you please post an update to frictionlessdata/software-legacy#26 (requirements I should mark as done)? I think I've reviewed the Java libs when this one has been just bootstrapped.
Related to validation:
- a descriptor should conform to JSON Schema pointed in
descriptor.profile
attribute. It could be from the registry - http://frictionlessdata.io/schemas/registry.json. Or an url to JSON Schema. Take a look on https://github.com/frictionlessdata/datapackage-js/blob/master/src/profile.js (it's kind a simple class and it used insidePackage
class for validation) - this requirement doesn't include any interaction with a actual data. Only descriptor/metadata validation. So we don't need to open/read data files for it
@roll I'm bypassing the registry and checking if the profile files exist directly. Is this OK or is there a particular reason why the registry exists and why I should scan it before trying to read the profile file in question?
@georgeslabreche
I do the same for other implementations. For now a concept that we sheep profiles with the libraries so on low-level it's enough to check e.g. for {profile-id}.json
file in profiles
.
Registry could start to be more important e.g. if later we provide API like package = Package(..., remote_registry=True/http://...)
. But after specs have introduced concept of descriptor.profile = local/remote path
there is a question do we need it at all. Because with descriptor.profile
a descriptor could contain 100% of information not relaying on concrete implementations.
@roll I'll mark this as Done then :).