clone: seems to be pulling orphaned revisions unlike `git clone`
efiop opened this issue ยท 6 comments
One of our users https://iterativeai.slack.com/archives/C03JS2V4MQU/p1689332412460989?thread_ts=1689272591.583879&cid=C03JS2V4MQU found that when using dvcfs, we were downloading large files that they've removed from history. And when running git clone
those files were not downloaded.
Need to check if maybe one of our git backends is accidentally cloning more than intended.
It's probably that those files aren't orphaned, but that they are referenced by exp refs which are still pushed to the repo. git clone
does not fetch exp refs but git clone --mirror
(and DVC's clone implementation) do
@pmrowla Indeed, that explains it. Thank you!
(and DVC's clone implementation) do
Is it intentional though?
It's intentional so that you can dvc import
or dvc get
from named DVC experiments the same way you can import from a branch or tag name. We could try to make that lazier and only search and fetch exp refs if we fail to resolve a name on import, but that would be a DVC issue and not scmrepo.
@pmrowla Makes sense. So you mean that at least vanilla clone
implemented in scmrepo, should not clone exp refs by default, right? We should have some kind of flag or something, so that default behaviour is closer to git clone
. Or do I misunderstand something?
Vanilla Git.clone
in scmrepo doesn't do anything with exp refs, the exp ref behavior is kept in the DVC erepo code (it's technically an additional fetch after the default clone finishes):
https://github.com/iterative/dvc/blob/f1764bdc772916d40f824531705fffdfc462793e/dvc/repo/open_repo.py#L217C20-L217C20