emscripten-core/emsdk

(bazel) Unable to fetch wasm-binaries.tar.xz for emsdk 3.1.47+

Opened this issue · 12 comments

When executing this block in WORKSPACE, I am unable to use anything higher than emsdk 3.1.46. On emsdk 3.1.47 and above, I keep getting a 404 from storage.googleapis.com.
starlark

http_archive(
    name = "emsdk",
    sha256 = "5dd94e557b720800a60387ec078bf3b3a527cbd916ad74a696fe399f1544474f",
    strip_prefix = "emsdk-3.1.46/bazel",
    url = "https://github.com/emscripten-core/emsdk/archive/refs/tags/3.1.46.tar.gz",
)

load("@emsdk//:deps.bzl", emsdk_deps = "deps")
emsdk_deps()

load("@emsdk//:emscripten_deps.bzl", emsdk_emscripten_deps = "emscripten_deps")
emsdk_emscripten_deps(emscripten_version = "3.1.45")

load("@emsdk//:toolchains.bzl", "register_emscripten_toolchains")
register_emscripten_toolchains()

Just to clarify, emsdk downloads without a problem for the newer versions, but the referenced wasm-binaries archive in the rule emscripten_bin_linux seems to have incorrect URLs.

The wasm-binaries archive swtich from tar.gz to tar.xz back in #1281, which seem like it must be related.

Are you saying that emscripten_bin_linux is incorrect for older versions or newer versions? I would have thought it simply would not work for older versions since older versions use .tar.gz.

Perhaps we could find a way to make emscripten_bin_linux aware of the version?

Is there some reason you need/want to install those old versions?

Would it work if you used the same version of emsdk was emscripten_version? i.e. a version of emsdk prior to #1281?

Are you saying that emscripten_bin_linux is incorrect for older versions or newer versions? I would have thought it simply would not work for older versions since older versions use .tar.gz.

What I am seeing is that the recent versions of emsdk (3.1.47+) are not working for older versions of Emscripten (e.g., 3.1.45). For example, if try to load Emscripten into my workspace like this:

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

http_archive(
    name = "emsdk",
    sha256 = "47515d522229a103b7d9f34eacc1d88ac355b22fd754d13417a2191fd9d77d5f",
    strip_prefix = "emsdk-3.1.59/bazel",
    url = "https://github.com/emscripten-core/emsdk/archive/refs/tags/3.1.59.tar.gz",
)

load("@emsdk//:deps.bzl", emsdk_deps = "deps")
emsdk_deps()

load("@emsdk//:emscripten_deps.bzl", emsdk_emscripten_deps = "emscripten_deps")
emsdk_emscripten_deps(emscripten_version = "3.1.45")

load("@emsdk//:toolchains.bzl", "register_emscripten_toolchains")
register_emscripten_toolchains()

Emsdk tries to fetch this invalid URL (404): https://storage.googleapis.com/webassembly/emscripten-releases-builds/linux/2b7c5fb8ffeac3315deb1f82ab7bf8da544f84a1/wasm-binaries.tar.xz

Note however that changing the extension manually to wasm-binaries.tar.gz also results in a 404.

Is there some reason you need/want to install those old versions?

I have a custom runtime and since the ABI is not stable and since I use Pyodide in my project, I pin the Emscripten version to the same version of Emscripten as the latest Pyodide release.

I just tried emsdk 3.1.59 with emscripten 3.1.46 which downloads, however, the checksum for wasm-binaries.tar.xz doesn't line up with what emsdk expects.

Expected: 75cbf14629b06e417b597d3f897ad7d881c53762380aca2f0dd85f1b15891511
Got: 8346da51c82fdd67369a4f31b4bc9dcb8734ace945725124edf4289714c5a80d

If you always match the emsdk version with the emscripten version does that work? That seems like your safest bet.

It does work, although I was hoping that I could use a version of emsdk with 90d2168 and Emscripten 3.1.46.

In order to make that work we would need to teach the bazel toolchains the pointer at which the filenames got renamed from tar.gz to tar.gz (for linux) and .tbz2 to .tar.bz (for mac) .

This might be possible.. ?

I believe the transition in filenames happened between 3.1.46 and 3.1.47.

The reason you are seeing the wrong sha hash is that for 3.1.46 it looks like we uploaded both the .tar.xz and the .tbz2 file.

$ wget https://storage.googleapis.com/webassembly/emscripten-releases-builds/linux/21644188d5c473e92f1d7df2f9f60c758a78a486/wasm-binaries.tar.xz
$ wget https://storage.googleapis.com/webassembly/emscripten-releases-builds/linux/21644188d5c473e92f1d7df2f9f60c758a78a486/wasm-binaries.tbz2
$ sha256sum wasm-binaries.t*
8346da51c82fdd67369a4f31b4bc9dcb8734ace945725124edf4289714c5a80d  wasm-binaries.tar.xz
75cbf14629b06e417b597d3f897ad7d881c53762380aca2f0dd85f1b15891511  wasm-binaries.tbz2

It looks like it should be possible to modify bazel/emscripten_deps.bzl such that it makes the file extension depend on the version:

if version == "latest":
version = reversed(sorted(EMSCRIPTEN_TAGS.keys(), key = _parse_version))[0]
if version not in EMSCRIPTEN_TAGS.keys():
error_msg = "Emscripten version {} not found.".format(version)
error_msg += " Look at @emsdk//:revisions.bzl for the list "
error_msg += "of currently supported versions."
fail(error_msg)
revision = EMSCRIPTEN_TAGS[version]
emscripten_url = "https://storage.googleapis.com/webassembly/emscripten-releases-builds/{}/{}/wasm-binaries{}.{}"
# This could potentially backfire for projects with multiple emscripten
# dependencies that use different emscripten versions
excludes = native.existing_rules().keys()
if "nodejs_toolchains" not in excludes:
# Node 16 is the first version that supports darwin_arm64
node_repositories(
node_version = "16.6.2",
)
if "emscripten_bin_linux" not in excludes:
http_archive(
name = "emscripten_bin_linux",
strip_prefix = "install",
url = emscripten_url.format("linux", revision.hash, "", "tar.xz"),
sha256 = revision.sha_linux,
build_file_content = BUILD_FILE_CONTENT_TEMPLATE.format(bin_extension = ""),
type = "tar.xz",
)

If we did that then you could use emsdk main to install older versions.

I believe the transition in filenames happened between 3.1.46 and 3.1.47.

The reason you are seeing the wrong sha hash is that for 3.1.46 it looks like we uploaded both the .tar.xz and the .tbz2 file.

$ wget https://storage.googleapis.com/webassembly/emscripten-releases-builds/linux/21644188d5c473e92f1d7df2f9f60c758a78a486/wasm-binaries.tar.xz
$ wget https://storage.googleapis.com/webassembly/emscripten-releases-builds/linux/21644188d5c473e92f1d7df2f9f60c758a78a486/wasm-binaries.tbz2
$ sha256sum wasm-binaries.t*
8346da51c82fdd67369a4f31b4bc9dcb8734ace945725124edf4289714c5a80d  wasm-binaries.tar.xz
75cbf14629b06e417b597d3f897ad7d881c53762380aca2f0dd85f1b15891511  wasm-binaries.tbz2

This only applies to 3.1.46, right? So the hash just needs to be adjusted or does fixing this still mean making bazel/emscripten_deps.bzl aware of the relationship between the version and the extension?

I think the only real fix is to make bazel/emscripten_deps.bzl aware of the relationship between the version and the extension.

For the record, we noticed this too with https://github.com/mymindstorm/setup-emsdk in GitHub Actions, which started raising the same error when attempting to setup up emscripten 3.1.39.

It seems that action always downloads the latest main branch from emsdk, regardless of the version specified for emscripten. I'll file an issue there to suggest changing that behavior to check out the same branch for emsdk as the target emscripten version.
Edit: Done: mymindstorm/setup-emsdk#45

I'm not sure why things started breaking today though, when the latest emsdk main commit was a week ago.

Also, the transition of filenames happened months ago.