Cube validation pipeline
tpluscode opened this issue · 3 comments
We talked with @ktk about a possible pipeline to validate cube against cube.link shapes
I'd propose to have the main usage from CLI
barnard59 rdf check-cube --shapes {shapes}
For {shapes}
I would think at least the name of the "official" shapes such as standalone-constraint-constraint
. Could also support multiple and make it optional, so that by default the basic+standalone shapes are used. I would prefer to get the cubes from an NPM package (peer dependency) so that it's easier to pin a specific version. Easier than with dereferencing
It remains open how to get the actual cube(s) metadata. I would support options both for remote cubes (from Lindas) as well as local sources
If we're adventurous, might even try to source cubes from standard input, thus removing that part from validation itself
cat cube.ttl | barnard59 rdf check-cube
or from remote, if we'd also add a bespoke query command to retrieve the necessary triples
barnard59 rdf fetch-cube --uri http://example.com/cube (--sparql ...)
| barnard59 rdf check-cube
Discussed with @giacomociti
- We need two separate pipelines: validating data an validating
cube:CubeConstraint
- The
fetch-cube
pipeline would load either shape-only--cube-constraint-only
, or the whole cube, including observations - In hypothetical case of shape and observation being separated, the user can simply concatenate the streams
- the shape must come first, before observations
- observation triples must be grouped
I would prefer to get the cubes from an NPM package (peer dependency) so that it's easier to pin a specific version. Easier than with dereferencing
On second thought, maybe using URI would be better, supporting custom validation profiles outside cube.link.
barnard59 rdf check-cube --profile https://cube.link/v0.0.5/shape/standalone-cube-constraint
node_modules
path could be an optimised choice to avoid dereferencing
barnard59 rdf check-cube --profile file:node_modules/cube-link/validation/standalone-cube-constraint.ttl
Validating constraints is less expensive so it should be fine to repeat in case of multiple profiles to check. Then, the expensive operation of validating the data could follow.
barnard59 rdf fetch-cube --uri http://example.com/cube --cube-constraint-only
| barnard59 rdf check-cube-constraint --profile B
barnard59 rdf fetch-cube --uri http://example.com/cube --cube-constraint-only
| barnard59 rdf check-cube-constraint --profile B
barnard59 rdf fetch-cube --uri http://example.com/cube
| barnard59 rdf check-cube-observations
This is now provided by barnard59-cube
package