Database schema (Data source repository)
Closed this issue · 1 comments
YaphetKG commented
Having investigated options such as Pachyderm, IRODs , Lakefs. LakeFs seems promising.
The task here includes:
-
Propose a data repository management scheme that would encompass
-
Versioning of Input data to the pipeline.
-
Change detection between versions of input data.
-
meta data management on how the data changed , provenance (eg. docker image version or code that performed change)
-
Getting Lakefs stood up for testing proposed scheme.
After above steps task would be implementing workflow scheme make use of the versioning and provenance infrastructure.
YaphetKG commented
Migrated to Jira (https://renci.atlassian.net/browse/DUG-15)