bazel-contrib/SIG-rules-authors

Should rulesets distribute a pre-built artifact rather than rely on GitHub source/release archive

Closed this issue Β· 101 comments

Rules ought to distribute an artifact that doesn't contain references to development-time dependencies, and omits testing code and examples.

This means the distribution can be broken if files are missing.

In addition, rules ought to integration-test against all supported bazel versions. So there should be some bazel-in-bazel test that consumes the HEAD distribution artifact and tests that the examples work.

Right now there are a few ways. rules_nodejs and rules_python have a built-in integration test runner. rules_go has a special go_bazel_test rule.

See https://docs.google.com/document/d/1s_8AihGXbYujNWU_VjKNYKQb8_QGGGq7iKwAVQgjbn0/edit?usp=sharing for discussion around the requirements for testing against multiple Bazel versions.

Rules ought to distribute an artifact that doesn't contain references to development-time dependencies, and omits testing code and examples.

Could you motivate this? It is not clear to me why this should be mandated.

If the motivation is that users of a rule set should not depend on dev-dependencies of that rule set, then this can be achieved without a dedicated distribution artifact. E.g. in rules_haskell dev-dependencies are only pulled in in rules_haskell's own WORKSPACE file, while regular dependencies are pulled in by the rules_haskell_dependencies macro that users are meant to call as well. Also the upcoming Bazel modules mechanism has a notion of dev-dependencies IIRC.

I think it is a plus that Bazel rule sets can be imported directly from their source repository at any commit without needing to generate a distribution artifact first. This makes it very easy to pull in a non-release commit of a rule set that contains a needed fix. If rule sets are only intended to be used from distribution artifacts, then this use-case is no longer necessarily supported, as a rule set may depend on generated files that are only included in the distribution artifact.

Either way, I don't think this should be mandated without the required tooling being available. See below.


Regarding bazel-in-bazel tests. I agree that this would be useful to have. We have looked into this for rules_haskell and in this context looked into a Gazelle extension to generate filegroup targets capturing all files required to run the rule set. (The same would be useful for generating distribution artifacts.)

We based our efforts on Gazelle's test_filegroup. However, we found it to be lacking for our use-case. Issues that come to mind are that it does not respect .gitignore or .bazelignore files, leading to invalid file inclusions of e.g. embedded workspaces for integration testing or user local configuration files like .bazelrc.local. Or that it assumes that every directory is a Bazel package, which is not a valid assumption and breaks labels like //my/pkg:src/some/source/file.

It would be great to have general purpose versions of test_filegroup and go_bazel_test available for any rule set to use. I'd view this as a prerequisite for this recommendation.

Mostly the pre-built distribution artifact is required to get a stable checksum. If you rely on GitHub's generated .tgz source archives, you get a breakage when GitHub makes OS updates on their servers that create those archives.
It's also handy to avoid someone building @some_ruleset//... and breaking because there's a load statement there from a dev dependency.
I agree that it's desirable that the distro artifact is same-shaped as the source repo (generally just a subset of files) so that it's easy to opt-in to a HEAD dependency. We made that mistake with rules_nodejs and are working to undo that.

Hi πŸ‘‹πŸ½,
Few more thoughts on my end:

  1. Having bazel in bazel tests is super valuable also for internal rules authors. I've had this need a few times.
  2. Just in case someone doesn't know then there's the bazel integration testing repo. I've failed to keep it alive but I think a lot of the concepts there are valuable.
  3. +1 for being able to use commits from github. In practice we only had 1 issue with the checksums in 5-6 years and I can't count how many builds.

@ittaiz what do you think about the SIG contributing or owning the current integration test repo in bazelbuild org?

Be happy to add contributors and even hand ownership over if you feel that's important

Mostly the pre-built distribution artifact is required to get a stable checksum. If you rely on GitHub's generated .tgz source archives, you get a breakage when GitHub makes OS updates on their servers that create those archives.

Is this still true? I haven't found official GitHub documentation stating that the archives are reproducible, but I have found this reproducible-builds thread pointing out that Github uses git archive and that git archive is designed to be reproducible.

Just as a quick test I compared the GH archive to a git archive created locally on rules_haskell.

$ curl -L https://github.com/tweag/rules_haskell/archive/455d9e6e8212f0bb73cd6e5437b0f5ce093e44be.tar.gz|sha256sum -
6841e554566d0c326beac84442dd776c49fac7d6059fef4728e75ae37c8e92cc  -
$ git clone https://github.com/tweag/rules_haskell; cd rules_haskell; git archive --format=tar --prefix=rules_haskell-455d9e6e8212f0bb73cd6e5437b0f5ce093e44be/ 455d9e6e8212f0bb73cd6e5437b0f5ce093e44be | gzip > tarball.tgz; sha256sum tarball.tgz
6841e554566d0c326beac84442dd776c49fac7d6059fef4728e75ae37c8e92cc  tarball.tgz

As you can see, the SHA256 is identical. This suggests that the archive is generated reproducibly.

Anecdotally, the only instance where I encountered issues with a changing commit hash in the last couple years was kubernetes/kubernetes#99376. In this case the change was due to a problematic .gitattributes configuration.

@aherrmann I've followed this guidance ever since Jay Conrod made a big deal out of it in rules_go and bazel_gazelle. bazelbuild/rules_go#2340 suggests maybe some GitHub URLs are reliable and some are not?

There is yet another reason I think rules should build their own distribution archive, which is that you can calculate your own checksum to produce the WORKSPACE snippet in the release process before shipping the commits to GitHub.

@aherrmann I've followed this guidance ever since Jay Conrod made a big deal out of it in rules_go and bazel_gazelle. bazelbuild/rules_go#2340 suggests maybe some GitHub URLs are reliable and some are not?

Thanks for the pointer, I dug into this a little. I've attached the details in the end, in short: I don't think this was a case of the Github generated source archive changing. Instead, it looks to me as though this was a mixup between the SHA for the Github generated source archive and the release artifact. So, I don't think this is evidence to support the claim that Github source archives are non-reproducible.

There is yet another reason I think rules should build their own distribution archive, which is that you can calculate your own checksum to produce the WORKSPACE snippet in the release process before shipping the commits to GitHub.

The same can be achieved using git archive --format=tar.gz --prefix=$NAME-$TAG/ $TAG | sha256sum when using source archives.

To be clear, I'm not saying one should not use release artifacts. But, I am saying that I don't see why it should be mandated that everyone use them without a good technical reason to motivate that mandate. I haven't seen such a reason, yet. As mentioned above, there are upsides to the source archive approach and costs to the release artifact approach.


Details:

If we take a look at the changes in the PR we see

--- a/multirun/deps.bzl
+++ b/multirun/deps.bzl
@@ -4,7 +4,7 @@ def multirun_dependencies():
     _maybe(
         http_archive,
         name = "bazel_skylib",
-        sha256 = "2ef429f5d7ce7111263289644d233707dba35e39696377ebab8b0bc701f7818e",
+        sha256 = "2ea8a5ed2b448baf4a6855d3ce049c4c452a6470b1efd1504fdb7c1c134d220a",
         strip_prefix = "bazel-skylib-0.8.0",
         urls = ["https://github.com/bazelbuild/bazel-skylib/archive/0.8.0.tar.gz"],
     )

The 0.8.0 release has a release artifact and of course the generated source archive. If we look at the SHAs of each of these we find

$ curl -L https://github.com/bazelbuild/bazel-skylib/releases/download/0.8.0/bazel-skylib.0.8.0.tar.gz|sha256sum -
2ef429f5d7ce7111263289644d233707dba35e39696377ebab8b0bc701f7818e  -
$ curl -L https://github.com/bazelbuild/bazel-skylib/archive/refs/tags/0.8.0.tar.gz|sha256sum -
2ea8a5ed2b448baf4a6855d3ce049c4c452a6470b1efd1504fdb7c1c134d220a  -

I.e. the old hash was the hash of the release artifact and the new hash is the hash of the generated source archive.

If we compare the contents of these two archives we find

$ curl -L https://github.com/bazelbuild/bazel-skylib/releases/download/0.8.0/bazel-skylib.0.8.0.tar.gz|tar ztv|head -n 5
...
drwxrwxr-x root/root         0 2019-03-20 18:13 .bazelci/
-rw-rw-r-- root/root      2348 2019-03-20 18:13 .bazelci/presubmit.yml
-rw-rw-r-- root/root         9 2019-03-20 18:13 .gitignore
-rw-rw-r-- root/root       308 2019-03-20 18:13 AUTHORS
-rw-rw-r-- root/root      1002 2019-03-20 18:13 BUILD

$ curl -L https://github.com/bazelbuild/bazel-skylib/archive/refs/tags/0.8.0.tar.gz|tar ztv|head -n 5
...
drwxrwxr-x root/root         0 2019-03-20 18:13 bazel-skylib-0.8.0/
drwxrwxr-x root/root         0 2019-03-20 18:13 bazel-skylib-0.8.0/.bazelci/
-rw-rw-r-- root/root      2348 2019-03-20 18:13 bazel-skylib-0.8.0/.bazelci/presubmit.yml
-rw-rw-r-- root/root         9 2019-03-20 18:13 bazel-skylib-0.8.0/.gitignore
-rw-rw-r-- root/root       308 2019-03-20 18:13 bazel-skylib-0.8.0/AUTHORS

I.e. the release artifact has no prefix, while the generated source archive does have the standard <repo>-<rev> prefix.

The change is from Jan 2020, I'm pretty sure Github generated source archives had the <repo>-<rev> prefixes at that time as well. So, it looks like the old hash was never that of a Github generated source archive, but that of the release artifact. It seems the issue here was most likely not that the generated source archive changed, but that the wrong hash was written into multirun/deps.bzl before.

For reference, I can produce an equivalent to the Github generated source archive with the same hash on my machine today:

$ git archive --format=tar.gz --prefix=bazel-skylib-0.8.0/ 0.8.0 | sha256sum
2ea8a5ed2b448baf4a6855d3ce049c4c452a6470b1efd1504fdb7c1c134d220a  -

If I try to reproduce the release artifact I get a different hash than the release artifact uploaded on

$ git archive --format=tar.gz 0.8.0 | sha256sum
a04a79bca280f759ec2339c035e19d1f249616c38a352f9fdb8837a7c0ea2f7c  -

But, comparing this generated prefix-less tarball to the release tarball I find

$ curl -L https://github.com/bazelbuild/bazel-skylib/releases/download/0.8.0/bazel-skylib.0.8.0.tar.gz > released.tar.gz
$ git archive --format=tar.gz 0.8.0 > generated.tar.gz
$ diffoscope released.tar.gz generated.tar.gz
--- released.tar.gz
+++ generated.tar.gz
β”œβ”€β”€ filetype from file(1)
β”‚ @@ -1 +1 @@
β”‚ -gzip compressed data, last modified: Wed Mar 20 18:02:49 2019, max compression
β”‚ +gzip compressed data, from Unix
β”‚   --- released.tar
β”œβ”€β”€ +++ generated.tar
β”‚ β”œβ”€β”€ filetype from file(1)
β”‚ β”‚ @@ -1 +1 @@
β”‚ β”‚ -POSIX tar archive (GNU)
β”‚ β”‚ +POSIX tar archive

So, the difference comes down to the release artifact containing slightly different headers including a timestamp.

Great discussion. I think this issue ended up conflating two things. We agree that we need bazel-in-bazel integration testing of rules, let's move that to a new issue since the bulk of discussion here was about the release archive and that's just one motivation for bazel-in-bazel testing.

I've updated all my repos, as well as the rules-template, to reflect that GitHub produces a stable SHA for the artifacts it serves.

fmeum commented

Sorry to revive this closed issue, but I just encountered a situation in which the SHA of a GitHub-provided archive changed over time and thus ended up breaking the build.

Over at https://github.com/CodeIntelligenceTesting/jazzer, we use the following dependency on abseil-cpp:

    maybe(
        http_archive,
        name = "com_google_absl",
        sha256 = "5e1cbf25bf501f8e37866000a6052d02dbdd7b19a5b592251c59a4c9aa5c71ae",
        strip_prefix = "abseil-cpp-f2dbd918d8d08529800eb72f23bd2829f92104a4",
        url = "https://github.com/abseil/abseil-cpp/archive/f2dbd918d8d08529800eb72f23bd2829f92104a4.zip",
    )

An hour ago, CI runs started to fail with this error:

ERROR: /home/runner/work/jazzer/jazzer/driver/BUILD.bazel:21:11: //driver:fuzzed_data_provider depends on @com_google_absl//absl/strings:str_format in repository @com_google_absl which failed to fetch. no such package '@com_google_absl//absl/strings': java.io.IOException: Error downloading [https://github.com/abseil/abseil-cpp/archive/f2dbd918d8d08529800eb72f23bd2829f92104a4.zip] to /home/runner/.cache/bazel/_bazel_runner/6bc610921f14939de4c55cf170d55a62/external/com_google_absl/temp17765729958342005876/f2dbd918d8d08529800eb72f23bd2829f92104a4.zip: Checksum was 70203fec1c4823d4fe689f1c413bc7a0e6b4556dbd55b5ac40fc8862bacc0dcb but wanted 5e1cbf25bf501f8e37866000a6052d02dbdd7b19a5b592251c59a4c9aa5c71ae

I attached both the ZIP file that can currently be obtained from https://github.com/abseil/abseil-cpp/archive/f2dbd918d8d08529800eb72f23bd2829f92104a4.zip (abseil-cpp-f2dbd918d8d08529800eb72f23bd2829f92104a4.github-new.zip) and the ZIP file that was previously generated by GitHub and that I obtained from my local repository cache (abseil-cpp-f2dbd918d8d08529800eb72f23bd2829f92104a4.github-old.zip).

Running diffoscope on these files shows that the mtimes hour changed:

...
β”‚β”„ Archive contents identical but files differ, possibly due to different compression levels. Falling back to binary comparison.
β”œβ”€β”€ zipinfo -v {}
β”‚ @@ -28,15 +28,15 @@
β”‚    file system or operating system of origin:      MS-DOS, OS/2 or NT FAT
β”‚    version of encoding software:                   0.0
β”‚    minimum file system compatibility required:     MS-DOS, OS/2 or NT FAT
β”‚    minimum software version required to extract:   1.0
β”‚    compression method:                             none (stored)
β”‚    file security status:                           not encrypted
β”‚    extended local header:                          no
β”‚ -  file last modified on (DOS date/time):          2021 Nov 11 00:09:50
β”‚ +  file last modified on (DOS date/time):          2021 Nov 11 08:09:50
β”‚    file last modified on (UT extra field modtime): 2021 Nov 11 08:09:50 local
β”‚    file last modified on (UT extra field modtime): 2021 Nov 11 08:09:50 UTC
β”‚    32-bit CRC value (hex):                         00000000
β”‚    compressed size:                                0 bytes
β”‚    uncompressed size:                              0 bytes
β”‚    length of filename:                             52 characters
β”‚    length of extra field:                          9 bytes
...

@aherrmann Do you have an idea how this could happen and whether tar.gz would not have been prone to this?

fmeum commented

Looks like the change has been rolled back, so this might have been an honest bug.

And they said that they would insure the checksum doesn't change in the future. So I think this might even harden the case that we can rely on the checksum.

fmeum commented

@brentleyjones That's great to know. Could you point me to the place where they confirmed that?

So not as strong as a guarantee as I originally read it as, but it seems the rollback was related to the checksum change: https://twitter.com/tgummerer/status/1488493440103030787

fmeum commented

There is https://twitter.com/tgummerer/status/1488493481874055173 though, so depending on archives for individual commits is unsafe.

Yikes πŸ˜•

I think we have to push hard and escalate (like Ulf did) to point out that GH is running a package repo and the world relies on it for supply-chain safety...

@aherrmann Do you have an idea how this could happen and whether tar.gz would not have been prone to this?

We've seen the same issue on some zip dependencies but not on tar.gz dependencies. It would be good to get clarification on this from GitHub as was requested here.

fmeum commented

After a fruitful exchange with GitHub support staff, I was able to confirm the following (quoting with their permission):

I checked with our team and they confirmed that we can expect the checksums for repository release archives, found at /archive/refs/tags/$tag, to be stable going forward. That cannot be said, however, for repository code download archives found at archive/v6.0.4.

It's totally understandable that users have come to expect a stable and consistent checksum value for these archives, which would be the case most of the time. However, it is not meant to be reliable or a way to distribute software releases and nothing in the software stack is made to try to produce consistent archives. This is no different from creating a tarball locally and trying verify it with the hash of the tarball someone created on their own machine.

If you had only a tag with no associated release, you should still expect to have a consistent checksum for the archives at /archive/refs/tags/$tag.

In summary: It is safe to reference archives of any kind via the /refs/tags endpoint, everything else enjoys no guarantees.

@fmeum Thank you for getting in touch with GitHub support and sharing the outcome here.

In summary: It is safe to reference archives of any kind via the /refs/tags endpoint, everything else enjoys no guarantees.

That's great to hear! I think in terms of this issue this means that it can be closed again. The artifacts under /archive/ref/tags/$tag are generated by GitHub and don't have to be pre-built.

That cannot be said, however, for repository code download archives found at archive/v6.0.4.

It's good to know that this distinction exists, I assume the same holds for /archive/$commit.

@aherrmann @fmeum we have a new problem with GitHub release archives - they don't give any metrics.

https://github.com/bazelbuild/bazel_metrics was just posted, but e.g. when rules_python changed to follow this guidance in 0.6 the usage numbers go to zero, as you can see from this third-party analyzer
https://hanadigital.github.io/grev/?user=bazelbuild&repo=rules_python

any ideas?

fmeum commented

I didn't know about this feature, but I can understand why auto-generated release archives (which are naturally source archives) are exempt from this.

It seems that rulesets will have to choose between a very simple release setup and one with statistics. Maybe the recommendation could be to upload the auto-generated archive as a release artifact? That way, there would only ever be one hash regardless of how users choose to reference the artifact.

It seems that rulesets will have to choose between a very simple release setup and one with statistics.

Yes, that seems to be the choice right now.

I don't know what Github's reasons or constraints are around this feature. But, it would certainly be very useful to rule authors to provide metrics for the auto-generated archive, at least for tagged releases. Arguably the lack thereof and the fact that auto-generated release archives exist for every release tag automatically, make metrics obtained from equivalent dedicated release artifacts generally unreliable: How could we as rule authors know whether the majority of our users choose the auto-generated archive or the dedicated one?

Github does have traffic numbers, e.g. for rules_python
Screenshot from 2022-06-15 08-10-24
We should figure out whether downloading the source archive counts as a "clone" (I'd hope so)

I think those would be just as useful as artifact downloads. Either way the absolute numbers can't be trusted, but it's comparable across rulesets.

Guess GitHub's guarantee didn't mean much? https://twitter.com/thesayynn/status/1620129657977987073

I think the only safe way from here on out is to attach your own release archives to a release.

Hey,

I'm one of the engineers in the Git Systems org at GitHub. I think there's been a misinterpretation of what we guarantee as far as stability.

If you generate a release for a particular tag, and you upload your own assets, such as a tarball or binaries, we'll guarantee those don't change. However, the automated "Source code (tar.gz)" and "Source code (zip)" links, as well as any automated archives we generate, aren't guaranteed to be stable. That's because Git doesn't guarantee stability here and we rely on Git to generate those archives on the fly, so as we upgrade, things may change.

If you need a stable source code archive, please generate a release and upload your own archive as part of this process, and then you can reference those with stable hashes.

To give you an example as to what's stable and what's not, if you look at the latest Git LFS release at https://github.com/git-lfs/git-lfs/releases/tag/v3.3.0, all of the Assets entries except the two "Source code" links at the bottom are guaranteed to be stable (since those two are autogenerated). You'll notice we ship our own stable tarball and signed hashes as part of the assets, and that works.

I apologize for the confusion here, and hopefully this clarifies things.

@bk2204 Okay, but the whole build system world is broken right now. Bazel, Homebrew, anything that does checksum hashing. This needs to be reverted and proper comms and a deprecation period needs to be communicated so all of these systems can fix their "broken assumptions".

@bk2204 This seems to be a change in policy from what engineers/support staff at GitHub have previously communicated:

#11 (comment)

Are you saying this policy has changed, and we can no longer rely on checksum stability for /archive/refs/tags/$tag URLs?

@bk2204 I'm sorry but this is unacceptable. Please realize that whatever upgrade you did internally is a backward incompatible change to your end users. Please quote one official document where Github clearly communicated about their checksum guarantee. There are a whole system of build systems which are broken because of this change.

vcpkg and conan are also probably broken

Are you saying this policy has changed, and we can no longer rely on checksum stability for /archive/refs/tags/$tag URLs?

I'm saying that policy has never been correct and we've never guaranteed stable checksums for archives, just like Git has never guaranteed that. I apologize that things are broken here and that there hasn't been clearer communication in the past on this, but our policy hasn't changed in over 4 years.

@bk2204 Your position is completely clear and, in isolation, totally reasonable. But, practically speaking, an enormous number of builds are broken right now, and an enormous number of historical commits will never be capable of building again, unless the hashes go back to the way they were. Is there any possibility that GitHub could admit defeat by Hyrum's Law here?

I don't believe this is going to be reverted at the moment, and a Changelog post has just shown up. Again, my apologies for this communication not showing up sooner.

I will mention that the thing that has changed is the compression, since Git has switched from using gzip to an internal call to zlib. Thus, in the interim, if you can depend on the checksum of the uncompressed tarball, that will not have changed here. Of course, that's not a good idea long term (since, again, they're not guaranteed), but it may help you fix the immediate problem temporarily until you can change to using release assets.

Indeed Conan is broken because of this. +1 on reverting the change if possible

FWIW, Spack is also likely to have a problem here.

facing the same issue, our production docker builds are failing

I don't believe this is going to be reverted at the moment, and a Changelog post has just shown up. Again, my apologies for this communication not showing up sooner.

Doubling down after breaking like every build in the world isn't a good look. You should probably get your manager to come do some PR work here.

Again, my apologies for this communication not showing up sooner.

This is the main issue, and why it should be reverted IMO.
Literally no one informed anyone of this change (regardless of if it was not guaranteed or not). The communication has been pretty confusing on this to begin with seeing as the same issue happened last year

Everyone who depended on this is completely broken with no simple way to move forward and unblock builds.

I don't believe this is going to be reverted at the moment, and a Changelog post has just shown up. Again, my apologies for this communication not showing up sooner.

Is homebrew going to need to update their brew formulas which can no longer build from source, by updating thousands of formula source files? (There seem to be several thousand of them which rely on hashes from GitHub-hosted tar.gz archives)

Homebrew is not the only one. Other package managers' package templates frequently rely on GitHub archives. I agree that GitHub's position here makes sense, but on the other hand, releasing the blog post after ruining everyone's builds is not acceptable IMO. Prior warning would have been ideal.

releasing the blog post after ruining everyone's builds is not acceptable IMO. Prior warning would have been ideal.

It has always been this way. They broke back in the early 2010's too (and more than once too) for a similar reason: git changed how git archive works by default.

The collective amount of human effort it will take to break glass, recover broken build systems that are impacted by this change, and republish artifacts across entire software ecosystems could probably cure cancer. Please consider reverting this change as soon as possible. It's fine to announce and plan a migration with a hard deadline, but the disruption this change has caused is massive.

Looking at the git change, it could probably be a git -c … archive change. However, all of this reliance on /archive/ means that the default can probably never change…

I think the response here strongly indicate that it needs to be a product requirement that all URLs linked on release pages have stable contents. For supply chain security, there must be a way to guarantee that release artifacts have not been tampered with. That includes both manually-uploaded and autogenerated artifacts. If the underlying implementation (i.e. git) doesn't guarantee stability, you need to put a caching layer in front that does.

I think it's unreasonable to say that you can expect release assets to remain consistent and then list archives as if they were releases assets while handling them entirely differently. This is misleading.

@bk2204 Millions of dollars of damage is being done here. It's understandable that you didn't expect people to depend on hashes being stable, but now that you know and understand how widely this assumption is being relied upon, the prudent thing to do would be to roll back and re-evaluate. You might just have to keep the old code around in a container than runs whenever older commits are requested -- you can use the new code on commits with newer timestamps.

Every build relying on Bazel is broken as well, and we cannot even fix the Build on our own since our dependencies have to be fixed first.

While we are updating our SHAs, we should probably just migrate to Gitlab, right?

anecdotal and i cannot point to anything specific currently, but i have seen some "compression changed on generated tarballs" on gitlab too (between some updates). i don't think any of these systems are designed with a perfect guarantee for this ("this" being hash-stable tarballs), we just got here by circumstance.

While we are updating our SHAs, we should probably just migrate to Gitlab, right? Is that the current recommendation for people who want more reliability?

fmeum commented

@bk2204 https://support.github.com/ticket/personal/0/1485189 (not visible publicly) made a clear commitment that the /archive/refs/tags/$tag endpoint would provide archives with stable hashes and should be relied upon for that purpose. I specifically asked for confirmation of this twice and received it. Happy to share the full conversation I had with support.

Hey folks. I'm the product manager for Git at GitHub. We're sorry for the breakage, we're reverting the change, and we'll communicate better about such changes in the future (including timelines).

did GitHub remove the staff tag from profiles? there's allegedly like 4 staff members in this thread and no one has the badge

We updated our Git version which made this change for the reasons explained. At the time we didn't foresee the impact. We're quickly rolling back the change now, as it's clear we need to look at this more closely to see if we can make the changes in a less disruptive way. Thanks for letting us know.

Also re: Staff badge, here's what I see in this thread:

image

Meta, @vtbassmatt this is what we normal users see.

Screenshot 2023-01-30 at 3 24 25 PM

this is what normal users see

Huh! I don't work on frontend stuff, so that's a mystery to me πŸ˜…

@vtbassmatt awesome thank you kindly ❀️

Will github provide stability guarantees around the non release tarball / zip URLs going forward?

Thank you, @vtbassmatt. May I suggest a regression test for this?

@vtbassmatt For those of us who dutifully updated our checksums in response to the original change, can you give us a timeline for the rollback so we can try to time rolling back our updates? I totally understand we are in the minority and rolling back the change is the right move, but of course the new interface was live, Hyrum's law and all that.

@jerrymarino too soon to commit to anything in particular (except "we will DEFINITELY communicate better"). There are good reasons on both sides.

@jmacdonald-Ocient the rollback is imminent, just winding its way through automated testing. I don't know for sure how long it will take to show up, I'm sorry.

Thanks GitHub staff for the quick response, looking forward to the follow-up communication. Idea on better communication going forward - could you please add a hint in the UI right by the download links that links to docs on what is/isn't stable and possibly best practices on getting stable artifacts and checksumming them? e.g. a help icon you can hover over with a tooltip or that links to the docs.

IME this form of contextual help/communication is really beneficial for customers that may not follow the blog, think to search the docs, etc. as it's right in the point of use.

If the checksum isn't stable, after the community is migrated, I would recommend that a random value is injected every time to really drive this point home so that no one reacquires an incorrect dependency. Hyrum's Law shows that documenting an interface as unstable is insufficient if in practice it's largely stable.

Hey folks. I'm the product manager for Git at GitHub. We're sorry for the breakage, we're reverting the change, and we'll communicate better about such changes in the future (including timelines).

Oh wow, what a wild ride, i'm so glad you are reverting it πŸ™

Please note that not everything can be migrated to pointing at a release package easily, a lot of the checksum errors I've experienced were in third party plugin/rule code we have no direct control over.

Some pointing to on-the-fly tar.gz generated source archive from specific historical revisions and such. obviously we'd do our part in migrating away and upgrading such dependencies but still this is quite a genuine threat and one that is hard to validate.

please take such concerns under consideration when rolling out a solution

Those files are generated new each time (with some caching - an hour I think). We told Git to use the old settings instead of its new default, so they’ll start getting generated with the old hashes again. I’m told the roll-out is complete, modulo resetting those caches.

FWIW, we're seeing more checksum mismatches in the last ~20 minutes than at any other time today.

Is it possible you invalidated a cache that was preventing some portion of artifacts from being regenerated, and now they are being regenerated before the rollback was complete?

@jfirebaugh that is indeed possible, we’ll look into it. Is it abating for you or still ongoing?

Ongoing

Here too -- all of our bazel builds died about four hours ago. I've been trying to band-aid it by updating hashes / moving repos to git_repository() from http_archive() but we are still seeing lots of issues since this incorrect use of http_archive() is pervasive in bazel-land

fishy commented

FWIW, we're seeing more checksum mismatches in the last ~20 minutes than at any other time today.

Is it possible you invalidated a cache that was preventing some portion of artifacts from being regenerated, and now they are being regenerated before the rollback was complete?

Echoing this, we also see things getting worse in the past ~20min.

MkJAS commented

Still having issues with vcpkg and installing libs such as boost and yaml-cpp. Believe this was a related issue.

Think we found a bug in the rollback. Being fixed now.

A statuspage would have been easier to follow than updates on a github issue.

In our case, we had rules_python mismatch which was fixed and everything worked for awhile.
But now, we are getting rules_java mismatch and everything stops working.

/root/.cache/bazel/_bazel_root/2d3430b48bd77b69b91ab356ef9daf21/external/rules_java/temp5602810203644143040/981f06c3d2bd10225e85209904090eb7b5fb26bd.tar.gz: Checksum was 01581e873735f018321d84d2e03403f37f4b05cf19d9c8036f14466e584a46b9 but wanted f5a3e477e579231fca27bf202bb0e8fbe4fc6339d63b38ccb87c2760b533d1c3

Think we found a bug in the rollback. Being fixed now.

Any update on this fix?

I have been thinking about this problem for a while, "safe and comfortable in the knowledge that it will never break". 🀣

So one good output of this actually occurring and then being reverted: I have gone ahead and actually posted an email to the git mailing list about the possible solution I've been thinking of for a while now: https://public-inbox.org/git/a812a664-67ea-c0ba-599f-cb79e2d96694@gmail.com/T/

I live in hope that we'll eventually see a world where the manpage for git-archive says "git archive is reproducible and here is why", and then no one ever has to have this debate again.

Any update on this fix?

Should be deployed now. I spoke too soon. It’s in progress but not fully out.

Should be deployed now.

I'm still seeing the broken checksum values . Does this rollout also requiring waiting an hour for the caches to reset?

EDIT:

I spoke too soon. It’s in progress but not fully out.

Clicked respond right before I saw the edit. Thanks for the update!

I agree with the person above saying a status page would be better than comment updates, but I think it's important to note that it's still appreciated regardless - it's vastly more helpful than radio silence, which a lot of companies and teams would be giving in a similar position right now. Thanks for keeping us updated!

Sorry for the false starts above, and I appreciate everyone’s patience with me. You should start seeing the old checksums now.

@vtbassmatt How does the rollback work? Do you need to literally re-generate all the affected releases which would take a long time to finish?

You should start seeing the old checksums now.

@vtbassmatt do we have to wait for a cache eviction? Still seeing bad hashes here

$ curl -sL https://github.com/madler/zlib/archive/v1.2.11.tar.gz | sha256sum
9917faf62bc29a578c79a05c07ca949cfda6e50a1c8a02db6ac30c5ea0aba7c0  -

(Bazel thinks this is supposed to be 629380c90a77b964d896ed37163f5c3a34f6e6d897311f1df2a7016355c45eff)

gumho commented

Doesn't look like the rollback is complete. For example, https://github.com/bazelbuild/rules_python/archive/refs/tags/0.16.1.tar.gz (https://github.com/bazelbuild/rules_python/releases/tag/0.16.1) still has the wrong (newer) checksum.

Thanks for the ping. This is unexpected and folks are looking at it immediately. I’ve got to step out of the thread now, but we do intend to revert to previous behavior.

@vtbassmatt I am having similar issues with GitHub Actions builds that are using npm to grab resources from GitHub. This is from about 1m ago

#14 7.808 npm WARN tarball tarball data for http2@https://github.com/node-apn/node-http2/archive/apn-2.1.4.tar.gz (sha512-ad4u4I88X9AcUgxCRW3RLnbh7xHWQ1f5HbrXa7gEy2x4Xgq+rq+auGx5I+nUDE2YYuqteGIlbxrwQXkIaYTfnQ==) seems to be corrupted. Trying again.
#14 7.913 npm ERR! code EINTEGRITY
#14 7.919 npm ERR! sha512-ad4u4I88X9AcUgxCRW3RLnbh7xHWQ1f5HbrXa7gEy2x4Xgq+rq+auGx5I+nUDE2YYuqteGIlbxrwQXkIaYTfnQ== integrity checksum failed when using sha512: wanted sha512-ad4u4I88X9AcUgxCRW3RLnbh7xHWQ1f5HbrXa7gEy2x4Xgq+rq+auGx5I+nUDE2YYuqteGIlbxrwQXkIaYTfnQ== but got sha512-GWBlkDNYgpkQElS+zGyIe1CN/XJxdEFuguLHOEGLZOIoDiH4cC9chggBwZsPK/Ls9nPikTzMuRDWfLzoGlKiRw==. (72989 bytes)

@mdouglass It's affecting anything that pins dependencies on Github by checksum

@mdouglass It's affecting anything that pins dependencies on Github by checksum

Yep, my point was more that it was still happening after the supposed rollback

Yeah. https://github.com/bazelbuild/rules_foreign_cc/archive/0.8.0.tar.gz was 6041f1374ff32ba711564374ad8e007aef77f71561a7ce784123b9b4b88614fc but it's still generating an archive that matches the same changed hash as earlier today (2fe52e77e11dc51b26e0af5834ac490699cfe6654c7c22ded55e092f0dd5fe57).

Will this issue continue to be used for status updates on the rollback?

fishy commented

I still don't see these 2 examples (there are a lot more) going back to what bazel rules expect:

curl -L https://github.com/bazelbuild/rules_python/archive/refs/tags/0.8.0.tar.gz | sha256sum
curl -L https://github.com/google/go-containerregistry/archive/v0.5.1.tar.gz | sha256sum

bazel rules are expecting 9fcf91dbcc31fde6d1edb15f117246d912c33c36f44cf681976bd886538deba6 & c3e28d8820056e7cc870dbb5f18b4f7f7cbd4e1b14633a6317cef895fdb35203, but we are still getting 5c619c918959d209abd203a63e4b89d26dea8f75091e26f33f719ab52097ef68 & 3f56ff9d903d76e760620669949ddaee8760e51093f9c2913786c85242075fda

Seeing at least one correct hash now

$ curl -sL https://github.com/madler/zlib/archive/v1.2.11.tar.gz | sha256sum
629380c90a77b964d896ed37163f5c3a34f6e6d897311f1df2a7016355c45eff  -

@fishy yours seem correct now too

$ curl -sL https://github.com/bazelbuild/rules_python/archive/refs/tags/0.8.0.tar.gz | sha256sum; curl -sL https://github.com/google/go-containerregistry/archive/v0.5.1.tar.gz | sha256sum
9fcf91dbcc31fde6d1edb15f117246d912c33c36f44cf681976bd886538deba6  -
c3e28d8820056e7cc870dbb5f18b4f7f7cbd4e1b14633a6317cef895fdb35203  -

I think the rollback is live now, some of my conan recipes started t work again πŸ₯³

~/git/mesonbuild/wrapdb] $ wget https://github.com/abseil/abseil-cpp/archive/20220623.0.tar.gz
~/git/mesonbuild/wrapdb] $ sha256sum abseil-cpp-20220623.0.tar.gz subprojects/packagecache/abseil-cpp-20220623.0.tar.gz 
4208129b49006089ba1d6710845a45e31c59b0ab6bff9e5788a87f55c5abd602  abseil-cpp-20220623.0.tar.gz
4208129b49006089ba1d6710845a45e31c59b0ab6bff9e5788a87f55c5abd602  subprojects/packagecache/abseil-cpp-20220623.0.tar.gz

My original testcase started working too (the subprojects/packagecache/ directory is my local copy from August 2022 of the archive that a contributor posted a ticket about at mesonbuild/wrapdb#884).

I'm seeing the expected hash on my rdkafka download:

curl -sL https://github.com/confluentinc/librdkafka/archive/v1.8.2.tar.gz | sha256sum
6a747d293a7a4613bd2897e28e8791476fbe1ae7361f2530a876e0fd483482a6  -

@vtbassmatt Thank you for handling this issues and reverting the change.

Could you clarify whether the committment to stable release artifacts, as mentioned here and here will be upheld going forward after this revert, or whether it may still change in the future?

You need to upload the tarball during the release creation. bazel-contrib/rules_cuda#56 (comment) We also experience checksum change somehow.

The template for rules has been changed to publish a tarball into the release instead of relying on GitHub to provide a stable interface. Ref bazel-contrib/rules-template#44.

FYI there's an independent motivation to upload artifacts: bazelbuild/bazel_metrics#4