helxplatform/dug

Database schema (Data source repository)

Closed this issue · 1 comments

Having investigated options such as Pachyderm, IRODs , Lakefs. LakeFs seems promising.
The task here includes:

  • Propose a data repository management scheme that would encompass

  • Versioning of Input data to the pipeline.

  • Change detection between versions of input data.

  • meta data management on how the data changed , provenance (eg. docker image version or code that performed change)

  • Getting Lakefs stood up for testing proposed scheme.

After above steps task would be implementing workflow scheme make use of the versioning and provenance infrastructure.