nfdi4plants/DataHUB

Adjustments for current mechanism of attaching metadata representation of ARCs

Opened this issue · 2 comments

Currently, 4 files are attached via the arc_json stage, see https://git.nfdi4plants.org/muehlhaus/ArcPrototype/-/jobs/8024.

  • arc-summary.md
  • arc-ro-crate-metadata.json
  • arc-isa.json
  • arc.json

All of these files are then pushed to a package in the ARC's package registry called isa_arc_json, which is not versioned, see https://git.nfdi4plants.org/muehlhaus/ArcPrototype/-/packages

I do not think we want to leave this mechanism as-is. I understand that some of it needs to backwards-compatible. I have some suggestions:

  • create one named package per file. That way, we can keep the isa_arc_json package, but it can then only contain actual isa json files:
  • rename the pipeline stage and job name. It is creating more than ar_json or isa_json. I would suggest calling the stage arc_metadata and the job Create ARC metadata
  • establish a versioning of these packages. As a first step, i would not touch major/minor/patch, but simply add the repo commit hash as build metadata to the semver build metadata suffix, e.g. 0.0.1+<commit-hash> (https://semver.org/#spec-item-10). further versioning of the arc should be user driven i think, and we currently have no mechanisms for that IIRC.

thoughts @muehlhaus @HLWeil @j-bauer ?

@gdoniparthi as a major user of the package.

But I guess the suggested changes would not affect your workflow?

Definitely agree on all those points. I don't think the backwards compatibility will be a problem. @gdoniparthi is already informed of the coming changes and will adapt his code accordingly.

I will create a list of the jobs and what they produce and I suggest we then try to define how the stages should be called and how the versioning should be. I like the semver stuff, so I'm fine to go with that.