tazjin/nixery

bugs & improvements ideas

Opened this issue · 2 comments

This is a collection of issues I encountered and improvement ideas that came to my mind while working on deploying Nixery backed by GCS into Google Cloud Run

Ideas

  • GCS url signing: detect currently available permissions/capabilities using GOOGLE_APPLICATION_CREDENTIALS as a last-resort fallback/backwards compat
  • allow overriding/mixing in nixery-image arguments
  • expose nixery-launch-script for use in own container image builds
  • implement pre/post build hooks:
    • would allow uploading results into binary cache/attic
  • (no idea if it makes sense) use binary cache as a storage backend? it doesn't because it uses different format
  • support local filesystem path (nix store?) for popularity count file
  • popcount: generate from local pkgs set, I think the old shell script did that
  • return either an empty [] or ["latest"] (possibly implemented by Backend) from tags list registry endpoint
    • currently skopeo inspect without --no-tags throws FATA[0002] Error determining repository tags: fetching tags list: StatusCode: 404, ""
  • 2024-03-05: allow specifying default branch (the convention is now main not master)
  • 2024-03-05: accommodate for short commits (currently it's exactly 40 characters)
  • 2024-03-06: move all (or at least most) environment variables from launch script to config.Env like in https://github.com/NixOS/nixpkgs/blob/937261e3e0f40832e05125159531bd06bd625585/pkgs/build-support/docker/default.nix#L1237-L1237

Bugs / required refactors

  • First build after start of nixery in a container (my setup) always fails with error: unexpected EOF reading a line at build-output.json, see below for details
  • 2024-03-06: change contents to proper invocation of copyToRoot, see nixpkgs
full `build-output.json` error on first start
2024-02-28 12:38:26.794 [nix] copying path '/nix/store/aa4cw3vy0vvaxi7jy8i6qp89vv10w15v-jq-1.7.1-dev' from 'https://cache.nixos.org'...
2024-02-28 12:38:26.819 [nix] /nix/store/m2bqpb4ll2lysz13vmkkdgxsn1kxr8ii-iana-etc-20231227/nix-support:
2024-02-28 12:38:26.819 [nix] setup-hook: /nix/store/lgr9b20c3r66aj0r36rnv128b4xl2vya-nss-cacert-3.95/nix-support/setup-hook
2024-02-28 12:38:27.014 [nix] error:
2024-02-28 12:38:27.014 [nix]        … while calling the 'derivationStrict' builtin
2024-02-28 12:38:27.014 [nix] 
2024-02-28 12:38:27.014 [nix]          at /builtin/derivation.nix:9:12: (source not available)
2024-02-28 12:38:27.014 [nix] 
2024-02-28 12:38:27.014 [nix]        … while evaluating derivation 'build-output.json'
2024-02-28 12:38:27.014 [nix]          whose name attribute is located at /nix/store/gzf4zwcakda1nykn6h0avh45xhjhvsz4-source/pkgs/stdenv/generic/make-derivation.nix:353:7
2024-02-28 12:38:27.014 [nix] 
2024-02-28 12:38:27.014 [nix]        … while evaluating attribute 'text' of derivation 'build-output.json'
2024-02-28 12:38:27.014 [nix] 
2024-02-28 12:38:27.014 [nix]          at /nix/store/gzf4zwcakda1nykn6h0avh45xhjhvsz4-source/pkgs/build-support/trivial-builders/default.nix:162:16:
2024-02-28 12:38:27.014 [nix] 
2024-02-28 12:38:27.014 [nix]           161|       ({
2024-02-28 12:38:27.014 [nix]           162|         inherit text executable checkPhase allowSubstitutes preferLocalBuild;
2024-02-28 12:38:27.014 [nix]              |                ^
2024-02-28 12:38:27.014 [nix]           163|         passAsFile = [ "text" ]
2024-02-28 12:38:27.014 [nix] 
2024-02-28 12:38:27.014 [nix]        error: unexpected EOF reading a line

adding second container in Terraform to initiate a single image build helped me work around that bug:

containers {
    name = "pull-once"

    depends_on = ["nixery"]

    image = "gcr.io/google.com/cloudsdktool/google-cloud-cli:alpine"
    command = [
      "bash",
      "-c",
      <<-EOT
      set -x
      until curl --fail --silent --show-error --no-progress-meter "http://localhost:8080/v2/shell/manifests/latest" ; do
        sleep $((RANDOM % 10))
      done
      echo "finished"
      EOT
    ]

    resources {
      # throttle outside requests
      cpu_idle = true
    }
  }

I think the EOF error should be solved by NixOS/nix#9804
found through DeterminateSystems/magic-nix-cache#32