google-deepmind/lab2d

Error on M1 Mac (aarch64)

RacingTadpole opened this issue · 21 comments

Hi - I've just bought a new Macbook Air with the M1 chip, and after much effort getting numpy and pygame to install with pip, I now run into a new problem with lab2d / pygame...

$ bazel run --config=lua5_2 -c opt dmlab2d/python:random_agent -- --level_name=clean_up

INFO: Build completed successfully, 2 total actions
pygame 2.0.1 (SDL 2.0.14, Python 3.9.1)
Hello from the pygame community. https://www.pygame.org/contribute.html
Traceback (most recent call last):
  File "/private/var/tmp/.../execroot/org_deepmind_lab2d/bazel-out/darwin-opt/bin/dmlab2d/python/random_agent.runfiles/org_deepmind_lab2d/dmlab2d/python/random_agent.py", line 27, in <module>
    from dmlab2d.python import dmlab2d
ImportError: dlopen(/private/var/tmp/.../execroot/org_deepmind_lab2d/bazel-out/darwin-opt/bin/dmlab2d/python/random_agent.runfiles/org_deepmind_lab2d/dmlab2d/python/dmlab2d.so, 2):
  Symbol not found: _png_init_filter_functions_neon
  Referenced from: /private/var/tmp/.../execroot/org_deepmind_lab2d/bazel-out/darwin-opt/bin/dmlab2d/python/random_agent.runfiles/org_deepmind_lab2d/dmlab2d/python/dmlab2d.so
  Expected in: flat namespace
 in /private/var/tmp/.../execroot/org_deepmind_lab2d/bazel-out/darwin-opt/bin/dmlab2d/python/random_agent.runfiles/org_deepmind_lab2d/dmlab2d/python/dmlab2d.so

I got several of the pygame examples running ok (eg. chimp and liquid), although "aliens" just prints Sorry, extended image module required. I've googled for a way to add the extended image module but I can't find anything. Not sure if that's related to the lab2d problem above.

Also, I'm using bazel version 4.0.0; and I am now using pyenv for managing python versions, so the location in python.BUILD is Users/<me>/.pyenv/versions/3.9.1/include/python3.9.

Sorry to raise another problem but any advice would be much appreciated! Thanks!

I just updated libpng from 1.6.34 to 1.6.37, could you please try again?

That said, since we're building libpng from source, you will most likely need to edit bazel/png.BUILD: it looks like the prebuilt config default isn't good for ARM+Neon. Either we need to provide a different configuration, or disable ARM support (e.g. by defining PNG_ARM_NEON_OPT=0?), or you could try to switch to a locally installed system version of libpng. I'd try the first approach, providing an alternative config file. You'll probably also need to update the sources for the library target, which don't seem to include an intrinsics code at the moment.

If you want to play with the build rules for libpng, you can find the extracted archive in bazel-dmlab2d/external/png_archive (until the next bazel clean --expunge), where you will also find all the available source files that aren't currently getting compiled, as well as the preconfigured config file.

Thanks! As you say, the upgrade to 1.6.37 didn't fix it - although the error message refers to Symbol not found: _png_do_expand_palette_rgb8_neon now, not _png_init_filter_functions_neon.

I'm afraid I haven't used bazel before, or have much experience compiling libraries like this, so I'll have to ask some basic questions... Can you give me some advice on what changes to the configuration in png.BUILD are needed, and how to update the sources for the library target? (Also I'm not sure which file is the preconfigured config file in bazel-lab2d/external/png_archive/ - Makefile.am, Makefile.in, config.h.in?)

I found this link https://www.ridgesolutions.ie/index.php/2014/02/05/cross-compiling-libpng-for-arm-linux-with-neon-and-zlib/ which suggests using

./configure --host=arm-linux-gnueabi CC=arm-linux-gnueabi-gcc \
    AR=arm-linux-gnueabi-ar STRIP=arm-linux-gnueabi-strip RANLIB=arm-linux-gnueabi-ranlib \
   CPPFLAGS="-mfpu=neon -I/path/to/zlib/include/files" LDFLAGS="-L/path/to/zlib/lib/files"  \
   --prefix=/path/to/dir/for/output/files 

but i'm not sure where these sorts of flags fit into the bazel pipeline.

Again, sorry for all the questions and thanks for helping me out!

If you check https://github.com/deepmind/lab2d/blob/main/bazel/png.BUILD#L6-L11, you'll see where the config file pnglibconf.h is copied from (namely from scripts/pnglibconf.h.prebuilt). I'd start by editing that file, e.g. to enable the Neon optimizaions. If you have a functioning local build, you can just compare that against your own generated config.h.

Then, after you've made the edits, you will presumably start seeing new linker errors because you're missing relevant source files. So you need to update the srcs array of the "png" library (https://github.com/deepmind/lab2d/blob/main/bazel/png.BUILD#L15-L36) to include the missing files. At the very least I expect you'll want the files form the arm subdirectory.

This is all pretty hacky, of course, since your edits will be lost as soon as Bazel redownloads the source archive (so keep a copy of your edited config file!), but once you've got something that works, we can ship that in a more permanent way.

@RacingTadpole: could you perhaps try something very simple and just add these three entries to the srcs in png.BUILD:

    src = [
        # ...
        "arm/arm_init.c",
        "arm/filter_neon_intrinsics.c",
        "arm/palette_neon_intrinsics.c",
    ],

Edit: I tried this on a Linux-on-aarch64, and that seemed to do the trick there.

Awesome - that did it!!!
image

Great! Could you please see if the tip of the https://github.com/deepmind/lab2d/commits/beta branch works for you out of the box (except perhaps for Python path adjustments)? Then I'll commit that to the main branch.

hmm, that doesn't work (with my bazel/python.BUILD changes).
(And it does work if I edit png.BUILD back to unconditionally include those three arm files.)
I guess my Mac isn't recognised as @platforms//cpu:arm64?

Aha, thanks! Could you by any chance try to find a condition that works? You can run bazel query @platforms//... to list available platforms, which can then form constraints of the config_setting in that we use in the select.

The fact that the condition works in Linux means that we're probably missing something Mac-specific!

OK, I tried all of these:

@platforms//cpu:s390x
@platforms//cpu:ppc
@platforms//cpu:i386
@platforms//cpu:armv7k
@platforms//cpu:armv7
@platforms//cpu:arm64e
@platforms//cpu:arm64_32
@platforms//cpu:arm64
@platforms//cpu:arm
@platforms//cpu:aarch64

and of them, only the last one worked, @platforms//cpu:aarch64.

Thanks, that's so weird, because I see that as a flat-out alias: https://github.com/bazelbuild/platforms/blob/master/cpu/BUILD#L17-L20

I don't see how that could behave any differently from :arm64!

That changed only recently: bazelbuild/platforms@e20c932

Are you on a tip-of-trunk Bazel? E.g. via https://github.com/bazelbuild/bazelisk?

I'm using bazel version 4.0.0

$bazel --version
bazel 4.0.0-homebrew

Ah, Bazel 4.0.0 is from Nov 11, and the change to the constraint value is from Dec 10. Perhaps this is in-flight and developing?

@hlopko -- any idea on how to detect a Mac M1 in Bazel 4.0.0, and at tip of trunk?

I see this comment too bazelbuild/platforms@e20c932#commitcomment-46429542 (from 7 hours ago)

Could you try to use a nightly-built bazel binary perhaps? I think the binaries are completely self-contained, but you can use https://github.com/bazelbuild/bazelisk to simplify this even more: USE_BAZEL_VERSION=nightly bazelisk build //... Then we can see if this has already been fixed at head.

(Another related discussion: bazelbuild/bazel#12900)

Could you perhaps try this again? It looks like there has been a new Bazel release (though it's still called 4.0.0?).

Hi, sorry, gotten distracted by other things - I'll give this a go soon.

Let me know if there's anything else to discuss!