isaacs/github

Support git-archive protocol

fenollp opened this issue · 16 comments

git-archive is used to download a zip or tarball of the repo at a specific commit.

  • Bitbucket does it
  • It is a part of Git

The HTTP API is fine, then why not support the git-archive API?

EDIT: FYI here's how to use the HTTP API

curl --fail --silent --show-error --location \
  https://codeload.github.com/<user>/<repo>/tar.gz/<branch or tag but not SHA>
c0b commented

+1 request for this feature; with remote archive protocol it's much more flexible, like this command to retrieve latest localedata from glibc:

➸ git archive --remote git://sourceware.org/git/glibc.git HEAD \
      localedata/locales |tar -tvv

vs current github just failed on git archive

➸ git archive --remote git://github.com/gcc-mirror/gcc HEAD localedata/locales
fatal: The remote end hung up unexpectedly

+1000

+1

Anyone knows of an update regarding this issue?

I tried the following without success:
git archive --format=zip --remote git://github.com/<account>/<repo>.git <tag/branch> <file-path> > <file-path>.zip
Getting fatal: The remote end hung up unexpectedly.

  • Tried several version like github.com:<account> and git@github.com and ssh:// but getting similar or other errors.

BTW, clone works:
git clone git@github.com:<account>/<repo>.git

Found a similar question on SE.

+1 Yes can we please get this?

Updated 1st comment (#554 (comment)) with a workaround.

I agree this would be useful, and wonder why it's not already there.
(And for private repos, the git protocol can use the ssh-based deploy key, which is scoped per repository, and can be made for read-only access (for organizations). It looks like the simple https-based credentialling solutions usually end up giving write access to more than just a repo.)

But maybe the following could be a work-around, for some situations:

git clone --depth 1 --branch <some-tag> git@github.com:<account>/<repo>.git

This would give you the code without all the history except for the last commit of the snapshot you want. That is, the .git folder would not be so big (almost all the objects not there). This has the bonus of still implicitly storing information about the exact "version" of the software you have.

I agree this would be useful, and wonder why it's not already there.
(And for private repos, the git protocol can use the ssh-based deploy key, which is scoped per repository, and can be made for read-only access (for organizations). It looks like the simple https-based credentialling solutions usually end up giving write access to more than just a repo.)

But maybe the following could be a work-around, for some situations:

git clone --depth 1 --branch <some-tag> git@github.com:<account>/<repo>.git

This would give you the code without all the history except for the last commit of the snapshot you want. That is, the .git folder would not be so big (almost all the objects not there). This has the bonus of still implicitly storing information about the exact "version" of the software you have.

unfortunately, this solution seemingly doesn't work with commit-id.

Why you dont simply use

https://github.com/<user>/<repo>/archive/<tag-name>.tar.gz

You only have to add tags to your repo and you automatically get a archive download for your repo at this point.

mirh commented

AFAIK git archive would even let you just download a specific subfolder.

I don't have time to check this out right now, but I wonder if any of this (under "SECURITY") is relevant. It is referenced in the "--remote=<repo>" section of the git archive --help

> git-upload-archive --help

GIT-UPLOAD-ARCHIVE(1)                                                             Git Manual                                                            GIT-UPLOAD-ARCHIVE(1)

NAME
       git-upload-archive - Send archive back to git-archive

SYNOPSIS
       git upload-archive <directory>

DESCRIPTION
       Invoked by git archive --remote and sends a generated archive to the other end over the Git protocol.

       This command is usually not invoked directly by the end user. The UI for the protocol is on the git archive side, and the program pair is meant to be used to get an
       archive from a remote repository.

SECURITY
       In order to protect the privacy of objects that have been removed from history but may not yet have been pruned, git-upload-archive avoids serving archives for
       commits and trees that are not reachable from the repository’s refs. However, because calculating object reachability is computationally expensive, git-upload-archive
       implements a stricter but easier-to-check set of rules:

        1. Clients may request a commit or tree that is pointed to directly by a ref. E.g., git archive --remote=origin v1.0.

        2. Clients may request a sub-tree within a commit or tree using the ref:path syntax. E.g., git archive --remote=origin v1.0:Documentation.

        3. Clients may not use other sha1 expressions, even if the end result is reachable. E.g., neither a relative commit like master^ nor a literal sha1 like abcd1234 is
           allowed, even if the result is reachable from the refs.

       Note that rule 3 disallows many cases that do not have any privacy implications. These rules are subject to change in future versions of git, and the server accessed
       by git archive --remote may or may not follow these exact rules.

       If the config option uploadArchive.allowUnreachable is true, these rules are ignored, and clients may use arbitrary sha1 expressions. This is useful if you do not
       care about the privacy of unreachable objects, or if your object database is already publicly available for access via non-smart-http.

OPTIONS
       <directory>
           The repository to get a tar archive from.

GIT
       Part of the git(1) suite

Git 2.28.0                                                                        2020-07-28                                                            GIT-UPLOAD-ARCHIVE(1)

That is, I wonder if setting uploadArchive.allowUnreachable to True at the client could make it work? -or maybe that is the/a server setting, which GitHub has set to False?

So maybe try . . .

git config --global --bool --add uploadArchive.allowUnreachable 1

Nope. Not allowed (as the OP probably already knew).

> git archive --remote=git@github.com:copasi/COPASI.git HEAD Tools
Invalid command: 'git-upload-archive 'copasi/COPASI.git''
  You appear to be using ssh to clone a git:// URL.
  Make sure your core.gitProxy config option and the
  GIT_PROXY_COMMAND environment variable are NOT set.

https://twitter.com/GitHubHelp/status/322818593748303873

Do I understand correctly that GitHub doesn't provide any way to download a tarball from a private repository using a read-only key? As far as I see GitHub's HTTPS doesn't provide read-only keys and GitHub's Git doesn't provide tarballs.

Wouldn’t it be nice if GitHub was open source and the community itself could submit a PR and implement this.

In a way archive is part of git protocol and many others providers implement it. Yet it’s stuck on a backlog for over 3 years.

For private repos you can use tarball/zipball links:

curl -L https://api.github.com/repos/octocat/Hello-World/zipball/master?access_token=$TOKEN --output hello.tar.zip

issue open in 2016 ?? lol