zazuko/barnard59

Cube validation pipeline

tpluscode opened this issue · 3 comments

We talked with @ktk about a possible pipeline to validate cube against cube.link shapes

I'd propose to have the main usage from CLI

barnard59 rdf check-cube --shapes {shapes}

For {shapes} I would think at least the name of the "official" shapes such as standalone-constraint-constraint. Could also support multiple and make it optional, so that by default the basic+standalone shapes are used. I would prefer to get the cubes from an NPM package (peer dependency) so that it's easier to pin a specific version. Easier than with dereferencing

It remains open how to get the actual cube(s) metadata. I would support options both for remote cubes (from Lindas) as well as local sources

If we're adventurous, might even try to source cubes from standard input, thus removing that part from validation itself

cat cube.ttl | barnard59 rdf check-cube

or from remote, if we'd also add a bespoke query command to retrieve the necessary triples

barnard59 rdf fetch-cube --uri http://example.com/cube (--sparql ...)
  | barnard59 rdf check-cube

Discussed with @giacomociti

  1. We need two separate pipelines: validating data an validating cube:CubeConstraint
  2. The fetch-cube pipeline would load either shape-only --cube-constraint-only, or the whole cube, including observations
  3. In hypothetical case of shape and observation being separated, the user can simply concatenate the streams
    • the shape must come first, before observations
    • observation triples must be grouped

I would prefer to get the cubes from an NPM package (peer dependency) so that it's easier to pin a specific version. Easier than with dereferencing

On second thought, maybe using URI would be better, supporting custom validation profiles outside cube.link.

barnard59 rdf check-cube --profile https://cube.link/v0.0.5/shape/standalone-cube-constraint

node_modules path could be an optimised choice to avoid dereferencing

barnard59 rdf check-cube --profile file:node_modules/cube-link/validation/standalone-cube-constraint.ttl

Validating constraints is less expensive so it should be fine to repeat in case of multiple profiles to check. Then, the expensive operation of validating the data could follow.

barnard59 rdf fetch-cube --uri http://example.com/cube --cube-constraint-only
  | barnard59 rdf check-cube-constraint --profile B

barnard59 rdf fetch-cube --uri http://example.com/cube --cube-constraint-only
  | barnard59 rdf check-cube-constraint --profile B

barnard59 rdf fetch-cube --uri http://example.com/cube
  | barnard59 rdf check-cube-observations

This is now provided by barnard59-cube package