frictionlessdata/datapackage-java

Read and validate a data package descriptor, including profile-specific validation via the registry of JSON Schemas

georgeslabreche opened this issue · 5 comments

Read and validate a data package descriptor, including profile-specific validation via the registry of JSON Schemas.

  • Create package from JSONObject descriptor.
  • Create package from JSON String descriptor.
  • Create package from Remote File descriptor.
  • Create package from Local File descriptor.
  • Schema validation.
  • Support strict validation flag (GitHub Issue #17).
  • Profile-specific validation via the registry of JSON Schemas.

@roll I think the only think missing thing here is was "Support strict validation flag," which I implemented as as new issue I created, issue #17, and "Profile-specific validation via the registry of JSON Schemas."

For "Read and validate a data package descriptor," I've updated the README to document how this has been implemented.

I'm not exactly sure what is meant by "Profile-specific validation via the registry of JSON Schemas," even after reviewing the specs. Does it simply mean that the data under "resources" needs to conform to the given the profile? For instance, if I have profile:"tabular-data-resource" then the files assigned to the data property need to be validated as CSV and then that CSV needs to be valid with respect to the provided schema (if a schema is provided in the first place)?

roll commented

@georgeslabreche
Cool. Could you please post an update to frictionlessdata/software-legacy#26 (requirements I should mark as done)? I think I've reviewed the Java libs when this one has been just bootstrapped.

Related to validation:

@roll I'm bypassing the registry and checking if the profile files exist directly. Is this OK or is there a particular reason why the registry exists and why I should scan it before trying to read the profile file in question?

roll commented

@georgeslabreche
I do the same for other implementations. For now a concept that we sheep profiles with the libraries so on low-level it's enough to check e.g. for {profile-id}.json file in profiles.

Registry could start to be more important e.g. if later we provide API like package = Package(..., remote_registry=True/http://...). But after specs have introduced concept of descriptor.profile = local/remote path there is a question do we need it at all. Because with descriptor.profile a descriptor could contain 100% of information not relaying on concrete implementations.

@roll I'll mark this as Done then :).