iterative/scmrepo

ssh: `User` overridden when using SSH host alias

Opened this issue · 3 comments

Bug Report

Description

As I have multiple GitHub accounts (work and personal) on a single machine, I use SSH aliases to easily switch between the accounts when using various git commands. Up until now this has worked perfectly, even since using DVC in our repositories.

The main thing I use the aliases for is cloning without needing credentials. For example if I needed to clone a work GitHub repository I would run

git clone work:WorkAccount/repo.git

This works fine, and I can carry on working on the code as normal, with all git commands and DVC commands functioning as expected, all except for dvc exp pull.

If a git repository has been set up with SSH using an alias, then dvc exp pull origin -A (or any experiment name/origin name) will crash with the following output (Where the SSH alias in this example is github):

ERROR: unexpected error - Git failed to fetch ref from 'origin': failed to resolve address for github: nodename nor servname provided, or not known  

This behavior is only applicable to dvc exp pull. dvc exp push and dvc exp list both work as expected when the repository is set up with an ssh alias to the remote location, for example the .git/config has the following:

[remote "origin"]

url = github:GitUser/repo.git

and .ssh/config contains for example:

Host github
  AddKeysToAgent yes
  UseKeychain yes
  HostName GitHub.com
  User git
  IdentityFile ~/.ssh/github

and ~/.ssh/github has been set up correctly for ssh access to GitHub.

If the .git/config file is edited as follows:

[remote "origin"]

url = git@github.com:GitUser/repo.git

then dvc exp pull works as expected

Reproduce

  1. Set up SSH to work with GitHub (GitHub guide here and working with miltiple GitHub accounts guide here
  2. Clone a DVC repo using git clone alias:iterative/example-get-started.git where alias is github in our above description, and should match whatever is set in ~/.ssh/config when you set up SSH with GitHub (Step 1).
  3. cd example-get-started
  4. dvc exp pull origin -A
  5. DVC crashes with the following error
ERROR: unexpected error - Git failed to fetch ref from 'origin': failed to resolve address for github: nodename nor servname provided, or not known  
  1. Edit .git/config and change
[remote "origin"]
        url = alias:iterative/example-get-started

to

[remote "origin"]
        url = git@github.com:iterative/example-get-started
  1. Rerun dvc exp pull origin -A and it will pull all experiments as expected

Expected

DVC to pull all experiments from the remote repository without any error when using an SSH alias

Environment information

Output of dvc doctor:

$ dvc doctor

DVC version: 3.2.3 (pip)
------------------------
Platform: Python 3.10.10 on macOS-13.1-arm64-arm-64bit
Subprojects:
	dvc_data = 2.3.1
	dvc_objects = 0.23.0
	dvc_render = 0.3.1
	dvc_task = 0.3.0
	scmrepo = 1.0.4
Supports:
	http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
	s3 (s3fs = 2023.6.0, boto3 = 1.26.161)
Config:
	Global: /Users/georged/Library/Application Support/dvc
	System: /Library/Application Support/dvc
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: https
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git
Repo.site_cache_dir: /Library/Caches/dvc/repo/32fab68d5a6e2090fffcc1a2bb65b88b

Additional Information (if any):

  • Also tested with homebrew installed DVC
  • Also tested on older versions of DVC (> 3) and latest DVC version
  • As mentioned above dvc exp push and dvc exp list both work as expected with an SSH alias

Workaround

When adding the relevant reproducibility notes to this issue, I noticed that in the gist I linked for setting up SSH for multiple accounts, they clone with:

git clone git@alias:GitAccount/repo.git

Which adds a git@ portion to the URL, and when I tried this, it did in fact fix the issue I was facing. However, while this is a fix, I'm still posting the issue in case there is actually a bug in place, as the git@ was not required for any other Git or DVC commands to function properly.

Also faced the same issue

Setting the user in the git URL shouldn't be required since you are using the User field in your ssh config host section. There's a bug somewhere but given that there's a simple workaround for now we probably won't prioritize this issue right away

I think the issue here is actually that the dulwich protocol check looks for the explicit (http|git| ssh):// scheme or git@ prefix to determine which client to use. In this case you can configure an ssh host that doesn't use the ssh:// or git@ prefix at all and just uses github:... (where github is the ssh config host alias). Will have to take a look at what cli git actually does here to determine whether or not the github:... should be treated as a local path or as a URL