dvc has problems when there is a git "insteadOf" configuration in place that transforms "https://" urls to "ssh://" urls
larsks opened this issue · 2 comments
I'm running this in a clean environment (an Ubuntu 23.04 container into which I've installed git, python3, etc, and no explicit git configuration other than user.name
and user.email
).
I start with an empty repository:
git init dvctest
cd dvctest
echo 'dvc example' > README.md
git add README.md
git commit -m 'Initial commit'
And then install dvc
into a virtual environment:
python3 -m venv .venv
. .venv/bin/activate
pip install dvc
And initialize dvc in the directory:
dvc init
Now, if I try the dvc get
command from the "Get Started" document, it works as expected:
dvc get https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
But with this Git configuration in place:
# git config --global url.ssh://git@github.com/.insteadof https://github.com/
The same command fails:
# dvc get https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
ERROR: failed to get 'get-started/data.xml' from 'https://github.com/iterative/dataset-registry' - Git failed to fetch ref from 'https://github.com/iterative/dataset-registry'
Running with -v
, it looks as if remote.ls_remote()
is throwing an authentication error:
Traceback (most recent call last):
File "/dvctest/.venv/lib/python3.11/site-packages/funcy/flow.py", line 84, in reraise
yield
File "/dvctest/.venv/lib/python3.11/site-packages/scmrepo/git/backend/pygit2/__init__.py", line 704, in fetch_refspecs
for head in remote.ls_remotes(callbacks=cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dvctest/.venv/lib/python3.11/site-packages/pygit2/remote.py", line 164, in ls_remotes
self.connect(callbacks=callbacks, proxy=proxy)
File "/dvctest/.venv/lib/python3.11/site-packages/pygit2/remote.py", line 112, in connect
payload.check_error(err)
File "/dvctest/.venv/lib/python3.11/site-packages/pygit2/callbacks.py", line 98, in check_error
check_error(error_code)
File "/dvctest/.venv/lib/python3.11/site-packages/pygit2/errors.py", line 65, in check_error
raise GitError(message)
_pygit2.GitError: authentication required but no callback set
But that doesn't make sense, because cloning the remote repository works just fine:
# git clone https://github.com/iterative/dataset-registry
Cloning into 'dataset-registry'...
remote: Enumerating objects: 296, done.
remote: Counting objects: 100% (91/91), done.
remote: Compressing objects: 100% (54/54), done.
remote: Total 296 (delta 52), reused 43 (delta 37), pack-reused 205
Receiving objects: 100% (296/296), 45.06 KiB | 1.22 MiB/s, done.
Resolving deltas: 100% (84/84), done.
You can see that git
has replaced the https://
url with an ssh://
url:
# git -C dataset-registry remote -v
origin ssh://git@github.com/iterative/dataset-registry (fetch)
origin ssh://git@github.com/iterative/dataset-registry (push)
And we can run git ls-remote
without a problem:
# git -C dataset-registry ls-remote
From ssh://git@github.com/iterative/dataset-registry
0f1b2967161751e1bc6b117952588bcfca123d89 HEAD
6672e265ea03930dba33146b0533942dcb6c5f30 refs/heads/artifact
e9769688078894f478b5051039a576c7e793e187 refs/heads/docs-dvc-remote
.
.
.
If I explicitly use an ssh url in the dvc get
command, like this:
# dvc get -v ssh://git@github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
Then it works fine:
2024-01-26 15:44:49,135 DEBUG: v3.42.0 (pip), CPython 3.11.4 on Linux-6.6.12-100.fc38.x86_64-x86_64-with-glibc2.37
2024-01-26 15:44:49,135 DEBUG: command: /dvctest/.venv/bin/dvc get -v ssh://git@github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
2024-01-26 15:44:49,213 DEBUG: Creating external repo ssh://git@github.com/iterative/dataset-registry@None
2024-01-26 15:44:49,213 DEBUG: erepo: git clone 'ssh://git@github.com/iterative/dataset-registry' to a temporary dir
2024-01-26 15:44:52,362 DEBUG: Analytics is enabled.
2024-01-26 15:44:52,379 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpyvk2pa46', '-v']
2024-01-26 15:44:52,384 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpyvk2pa46', '-v'] with pid 5207
2024-01-26 15:44:52,384 DEBUG: Removing '/tmp/tmpz4ybeh7wdvc-clone'
2024-01-26 15:44:52,386 DEBUG: Removing '/tmp/tmp6aot93__dvc-cache'
This should be fixed in scmrepo==2.1.1
which was just released
With the updated scmrepo
I was getting a new error...
ERROR: unexpected error - [Errno 2] No storage files available: 'get-started/data.xml'
...but it turns out that's because requests
, to my surprise, parses ~/.netrc
by default and was picking up some credentials it should not have been using. With that file out of the way, I am able to successfully dvc get
.