iterative/scmrepo

fs: support version_aware ?

Opened this issue · 5 comments

efiop commented

Currently, gitfs works on one single revision, but we could totally make version_aware version (similar to s3fs, gcsfs, adlfs) and support revisions as version_id. The implementation is fairly straightforward (just use Tree for a particular version_id and the rest is the same). This seems to make a lot of sense in dvc context of unifying get/import with get-url/import-url and gettind rid of DependencyRepo.

What’s the difference between opening a new filesystem and this?

efiop commented

@skshetry No difference, but possible to use in one place. Similar to how s3fs/etc supports it directly and not as a new fs instance (though that obviously doesn't make any sense for them since only files can be versioned and not directories).

Having it all in one fs will allow us to treat it as we treat versioned filesystems. For example, to check if updates are available we could do it the same way we do it for version_aware dependencies through the same filesystem instead of having to create 2.

efiop commented

Similar to how s3fs/etc supports it directly and not as a new fs instance

Obviously this is kinda awkward when you start dealing with it initially, but it makes more sense the more you use it. Since we have s3fs/gcsfs/adlfs already and have to deal with those, it makes sense to consider the same for gitfs since they have a lot of similarities.

Maybe it could be done with two filesystems GitTreeFileSystem(like our current one) and GitFileSystem(the one that works on the whole git repo and so version_aware makes sense for it). We were also talking about GitIndexFileSystem before, but that's a whole another story 🙂

With version_aware in gitfs, we are effectively talking about a completely new revision/instance.

Maybe it could be done with two filesystems GitTreeFileSystem(like our current one) and GitFileSystem(the one that works on the whole git repo and so version_aware makes sense for it).

I have been thinking something similar on dvcfs side: TheOneFileSystem that can traverse between multiple revisions. But I just want to avoid filesystems in general. It already has gone too far. :)