andir/npins

Support for Gitea and other forges

piegamesde opened this issue · 3 comments

I somehow just remembered that there are other ways to host your software other than GitHub and GitLab. Of course we support all generic git repositories, nevertheless there are more forges which could benefit from the additional support.

Potential forges:

  • Gitea
  • ...?

Questions that need answering for each one:

  • Is it self-hostable or fixed domain?
  • Does it have user/repo structure for all repositories like GitHub or is it more free-form like GitLab?
  • Does it host tarball artifacts for all commits?
    • Given a ref, what URL can its tarball be downloaded from?
    • Given a tag, what URL can its tarball be downloaded from?
  • In anticipation of #44, given a release with uploaded artifacts, how to download them?

To generally keep the scope down, the current restrictions will also be required for new candidates:

  • Finding appropriate versions is done through the generic Git API (git ls-remote etc.)
  • Downloading files must be as simple as filling in an URL template and calling curl. Things that require adding a library for the specific API implementation are out of scope.
lf- commented

I think the cleanest way to implement this in a maximally generic way is to implement Nix's immutable tarball protocol as used by flakehub and as of yesterday, forgejo, which is a trivial amount of code to possibly just upstream into all the forges.

With this you can just give https://some-forgejo/user/repo/archive/main.tar.gz, which is a URL that does not need to be inspected at all.

Here is some hastily written Python I wrote to implement it today:

import subprocess
import tempfile
from pathlib import Path
import re
import dataclasses
from typing import Literal
import urllib.parse
import json


@dataclasses.dataclass
class PinSerialized:
    kind: str
    rev: str | None
    nar_hash: str


@dataclasses.dataclass
class TarballPinSerialized(PinSerialized):
    kind: Literal['tarball']
    locked_url: str
    url: str


class PinSpec:

    def do_pin(self) -> dict[str, str]:
        raise ValueError('unimplemented')


@dataclasses.dataclass
class TarballPinSpec(PinSpec):
    url: str

    def do_pin(self) -> TarballPinSerialized:
        return lock_tarball(self.url)


@dataclasses.dataclass
class LinkHeader:
    url: str
    rev: str | None


LINK_HEADER_RE = re.compile(r'<(?P<url>.*)>; rel="immutable"')


def parse_link_header(header) -> LinkHeader | None:
    matched = LINK_HEADER_RE.match(header)
    if not matched:
        return None

    url = matched.group('url')
    parsed_url = urllib.parse.urlparse(url)
    parsed_qs = urllib.parse.parse_qs(parsed_url.query)

    return LinkHeader(url=url, rev=next(iter(parsed_qs.get('rev', [])), None))


def lock_tarball(url) -> TarballPinSerialized:
    """
    Prefetches a tarball using the Nix immutable tarball protocol
    """
    import requests
    resp = requests.get(url)
    with tempfile.TemporaryDirectory() as td:
        td = Path(td)
        proc = subprocess.Popen(["tar", "-C", td, "-xvzf", "-"],
                                stdin=subprocess.PIPE)
        assert proc.stdin
        for chunk in resp.iter_content(64 * 1024):
            proc.stdin.write(chunk)
        proc.stdin.close()
        if proc.wait() != 0:
            raise RuntimeError("untarring failed")

        children = list(td.iterdir())
        # FIXME: allow different tarball structures
        assert len(children) == 1

        child = children[0].rename(children[0].parent.joinpath('source'))
        sri_hash = subprocess.check_output(
            ["nix-hash", "--type", "sha256", "--sri", child]).decode().strip()
        path = subprocess.check_output(
            ["nix-store", "--add-fixed", "--recursive", "sha256",
             child]).decode().strip()

    link_info = parse_link_header(resp.headers['Link'])

    print(sri_hash, path)
    return TarballPinSerialized(kind='tarball',
                                nar_hash=sri_hash,
                                locked_url=link_info.url if link_info else url,
                                rev=link_info.rev if link_info else None,
                                url=url)

I'd not be opposed to supporting this. Do we know who else supports this in this way? Also it is worth keeping in mind that a Link-Header field can contain multiple URLs (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Link#specifying_multiple_links).

lf- commented

I'm not sure if there's any client implementations besides Nix and the software I wrote. Server wise the ones I know of are forgejo and flakehub.