Collection of ideas/issues (to be refined)
Opened this issue · 3 comments
Ideas
Some of these things might already exist or did exist in an older version of barnard59. Some might never happen.
- #85
- #93
- Facilitate bootstrapping pipelines
- prov-step (gitlab CI, repo name, etc)
- prov-metadata
- shacl step (auto-create shacl from passed data)
- void-step (check what we have and make this useful, use/provide shacl for validation as we do for some projects). Good examples:
- debug description/tutorial (as I did in vs code)
- re-think tdb step (https://github.com/zazuko/barnard59-tdb)
- generic swagger API step?
- s3 sink step
- JSONata, related to #61
- Node module: https://docs.jsonata.org/using-nodejs
- "pv" like for triples/sec?
- make https://github.com/zazuko/barnard59-pipeline-validation useful again
- make sure we have manifests everywhere
- auto-generate docs from manifests
- There should be an output in the logging for the "close" event. As in it logs when it's done. Like this we see exactly how long the pipeline was running, which is not the case if the last step takes longer.
- Valid IRI step. As far as I know it is still possible to create IRIs that are invalid & would fail when loading to a store. I don't think we catch that right now.
Can (probably) be solved with better templates/examples/docs
- Generic file/directory processing: #30
- Add static & concat: #27
- some API fetch example step?
- gzip read/write step (probably easy but not for me)
Probably done
npm generic pipelines (with manifests)
By this you mean so that they are easily reused? I might also add a template package to easily start an extends project. For that to use copier
load from more than one ttl file (required for above?)
re #65 and check pipeline.ts in museumplus-pipeline which follows a predicate to combine multiple RDF files
@tpluscode yes for re-use. We have standard pipelines that are used everywhere by now so we could include them instead and semantic version them.
Regarding Prov:
We have steps to add metadata and count subjects and properties using files as templates.
Append metadata
zazuko/barnard59-rdf#18
Count subjects and properties
zazuko/barnard59-rdf#21
These can be testes in the playground
https://barnard59-steps-playground.zazukoians.org/
IMO there is a lot of room for improvement, specifically for data produced via XRM, to generate this data automatically, and having declarative PROV metadata. This could be tested while upgrading flux-pipelines.