tech-greedy/singularity

Periodically add incremental data to existing DataSet

Closed this issue · 1 comments

Is your feature request related to a problem? Please describe.
Use-cases is a DataSet that grows periodically, with new incremental source data added periodically.
Incremental source data can be created in unique folder paths relative to the source data root.

Describe the solution you'd like
Support periodically appending new incremental data to a DataSet for generation and replication.
Support re-indexing and retrieval of the expanded DataSet

Describe alternatives you've considered
Creating a new DataSet for each incremental source. Indexes will be separate, making retrieval cumbersome.

anjor commented

@liuziba I was thinking about this too, and this was another use case for unixfs-cat . Basically every time we prepare a new datasets we also write a block that is concatenation of the old root cid of the already onboarded dataset and the newly prepared dataset. Should be fairly straightforward with the new generate-ipld-car stuff.