xetdata/pyxet

Feature request: pyxet commit history

Closed this issue · 0 comments

xdssio commented

Rational

Our "timestamps" are commits, therefore we should be able to explore them easily to make it more useful.
The function name can be "commit", "commits", or "history".
I prefer the word "history" over "commit" since it has less git connotation, it's "feels" more files-oriented (unlike merging of branches which are also commit but of many files) and is very explicit.

  • If files is True, returns a list of all files changes under that commit - this is a simple way to answer questions like: What was the model-card, metrics, database state when uploading model X.

Use cases

  • Checking the local data commit for reproducibility at the beginning of an experiment.
  • Checking the model commit at the end of an experiment to mark the connection between model-data-code.
  • When committing a preprocess script together with the processed data together - it get tracked.

API suggestion:

$ fs.history("xet://user/repo/branch/file-or-folder" | "local-file-or-folder", limit:int=1, files=False)
[{"hash": ..., "message": ... , "author":..., "date":... , files:[ ]}, ...] # a sorted list from new to old.