RetroPie/RetroPie-Setup

dosbox-staging is failing to build on armv6

joolswills opened this issue · 15 comments

Looks like a similar or the same issue as before with libspeexdsp

git clone --recursive --depth 1 --shallow-submodules --branch v0.80.1 "https://github.com/dosbox-staging/dosbox-staging.git" "/home/pi/RetroPie-Setup/tmp/build/dosbox-staging"
[17/398] Compiling C object subprojects/speexdsp-1.2.1/libspeexdsp/libspeexdsp.a.p/resample.c.o
FAILED: subprojects/speexdsp-1.2.1/libspeexdsp/libspeexdsp.a.p/resample.c.o 
cc -Isubprojects/speexdsp-1.2.1/libspeexdsp/libspeexdsp.a.p -Isubprojects/speexdsp-1.2.1/libspeexdsp -I../subprojects/speexdsp-1.2.1/libspeexdsp -Isubprojects/speexdsp-1.2.1 -I../subprojects/speexdsp-1.2.1 -Isubprojects/speexdsp-1.2.1/include -I../subprojects/speexdsp-1.2.1/include -Isubprojects/speexdsp-1.2.1/include/speex -I../subprojects/speexdsp-1.2.1/include/speex -fvisibility=hidden -fdiagnostics-color=always -DNDEBUG -D_FILE_OFFSET_BITS=64 -O3 -DHAVE_CONFIG_H -march=armv7-a -mfpu=neon -mcpu=arm1176jzf-s -mfpu=vfp -O2 -MD -MQ subprojects/speexdsp-1.2.1/libspeexdsp/libspeexdsp.a.p/resample.c.o -MF subprojects/speexdsp-1.2.1/libspeexdsp/libspeexdsp.a.p/resample.c.o.d -o subprojects/speexdsp-1.2.1/libspeexdsp/libspeexdsp.a.p/resample.c.o -c ../subprojects/speexdsp-1.2.1/libspeexdsp/resample.c
cc1: warning: switch -mcpu=arm1176jzf-s conflicts with -march=armv7-a switch
/tmp/ccpBz8x4.s: Assembler messages:
/tmp/ccpBz8x4.s:116: Error: selected processor does not support `vld1.32 {q4},[r1]!' in ARM mode
/tmp/ccpBz8x4.s:117: Error: selected processor does not support `vld1.32 {q8},[r3]!' in ARM mode
/tmp/ccpBz8x4.s:119: Error: selected FPU does not support instruction -- `vmul.f32 q0,q4,q8'
/tmp/ccpBz8x4.s:122: Error: selected processor does not support `vld1.32 {q4,q5},[r1]!' in ARM mode
/tmp/ccpBz8x4.s:123: Error: selected processor does not support `vld1.32 {q8,q9},[r3]!' in ARM mode
/tmp/ccpBz8x4.s:124: Error: selected processor does not support `vld1.32 {q6,q7},[r1]!' in ARM mode
/tmp/ccpBz8x4.s:125: Error: selected processor does not support `vld1.32 {q10,q11},[r3]!' in ARM mode
/tmp/ccpBz8x4.s:127: Error: selected FPU does not support instruction -- `vmul.f32 q0,q4,q8'
/tmp/ccpBz8x4.s:128: Error: selected FPU does not support instruction -- `vmul.f32 q1,q5,q9'
/tmp/ccpBz8x4.s:129: Error: selected FPU does not support instruction -- `vmul.f32 q2,q6,q10'
/tmp/ccpBz8x4.s:130: Error: selected FPU does not support instruction -- `vmul.f32 q3,q7,q11'
/tmp/ccpBz8x4.s:132: Error: selected processor does not support `vld1.32 {q4,q5},[r1]!' in ARM mode
/tmp/ccpBz8x4.s:133: Error: selected processor does not support `vld1.32 {q8,q9},[r3]!' in ARM mode
/tmp/ccpBz8x4.s:134: Error: selected processor does not support `vld1.32 {q6,q7},[r1]!' in ARM mode
/tmp/ccpBz8x4.s:135: Error: selected processor does not support `vld1.32 {q10,q11},[r3]!' in ARM mode
/tmp/ccpBz8x4.s:137: Error: selected FPU does not support instruction -- `vmla.f32 q0,q4,q8'
/tmp/ccpBz8x4.s:138: Error: selected FPU does not support instruction -- `vmla.f32 q1,q5,q9'
/tmp/ccpBz8x4.s:139: Error: selected FPU does not support instruction -- `vmla.f32 q2,q6,q10'
/tmp/ccpBz8x4.s:140: Error: selected FPU does not support instruction -- `vmla.f32 q3,q7,q11'
/tmp/ccpBz8x4.s:142: Error: selected FPU does not support instruction -- `vadd.f32 q4,q0,q1'
/tmp/ccpBz8x4.s:143: Error: selected FPU does not support instruction -- `vadd.f32 q5,q2,q3'
/tmp/ccpBz8x4.s:145: Error: selected FPU does not support instruction -- `vadd.f32 q0,q4,q5'
/tmp/ccpBz8x4.s:147: Error: selected processor does not support `vld1.32 {q6},[r1]!' in ARM mode
/tmp/ccpBz8x4.s:148: Error: selected processor does not support `vld1.32 {q10},[r3]!' in ARM mode
/tmp/ccpBz8x4.s:150: Error: selected FPU does not support instruction -- `vmla.f32 q0,q6,q10'
/tmp/ccpBz8x4.s:152: Error: selected FPU does not support instruction -- `vadd.f32 d0,d0,d1'
/tmp/ccpBz8x4.s:153: Error: selected processor does not support `vpadd.f32 d0,d0,d0' in ARM mode
/tmp/ccpBz8x4.s:1645: Error: selected FPU does not support instruction -- `vcvt.s32.f32 d0,d0,#15'
/tmp/ccpBz8x4.s:1646: Error: selected processor does not support `vqrshrn.s32 d0,q0,#15' in ARM mode
/tmp/ccpBz8x4.s:1647: Error: selected FPU does not support instruction -- `vmov.s16 r3,d0[0]'
[18/398] Compiling C object subprojects/speexdsp-1.2.1/libspeexdsp/libspeexdsp.a.p/preprocess.c.o
cc1: warning: switch -mcpu=arm1176jzf-s conflicts with -march=armv7-a switch
[19/398] Compiling C object subprojects/speexdsp-1.2.1/libspeexdsp/libspeexdsp.a.p/mdf.c.o
cc1: warning: switch -mcpu=arm1176jzf-s conflicts with -march=armv7-a switch
[20/398] Compiling C object subprojects/speexdsp-1.2.1/libspeexdsp/libspeexdsp.a.p/smallft.c.o
cc1: warning: switch -mcpu=arm1176jzf-s conflicts with -march=armv7-a switch
ninja: build stopped: subcommand failed.
speexdsp 1.2.1

  Library Summary
    Optimization level: 3
    SIMD instructions : NEON (armv7-a)
    Numerical type    : floating-point
    FFT library       : OggVorbis FFT (built-in)

Will need to look into it further. This is building on a rpi4 with a rpi1 target.

I can confirm 0.80.0 built fine -

pkg_origin="source"
pkg_date="2022-12-24T14:56:07+00:00"
pkg_repo_type="git"
pkg_repo_url="https://github.com/dosbox-staging/dosbox-staging.git"
pkg_repo_branch="v0.80.0"
pkg_repo_commit="dcb4408229992328fe1326a084ca2106469898aa"
pkg_repo_date="2022-12-20T19:01:22-08:00"
pkg_repo_extra=""
nemo93 commented

Adding @kcgen as I certainly don't want to speak for the team.

@joolswills thanks for reporting. I believe supporting anything slower/older than RPi3 might be out of scope as it would add too much constraints. Staging's team already made (big) efforts and compromises to support our beloved Pi devices.

Could we have the script available only for RPi3 and 4 (and x86/64) by addding !armv6 !armv7 perhaps? for Pi0/1 and 2 DosBox 'SVN' might be a better fit?

Thanks.

I was going to look at this myself if needed. It works fine on the rpi2. It was working on the rpi1 though so something must have changed - I noticed so I was making a report so I don't forget.

kcgen commented

Thanks @joolswills and @nemo93,

DOSBox Staging uses the Meson build system to find the SpeexDSP library on the host, test it, and (if good) use it.

Otherwise, it builds its own SpeexDSP from source using Meson's wrap system.
You can see Meson's SpeexDSP wrap-file (similar to a cmake or autoconf build recipe) here:

https://github.com/mesonbuild/wrapdb/blob/master/subprojects/packagefiles/speexdsp/meson.build

You'll see that it compiles a tiny NEON test program and runs it; and /should/ only enable and use NEON and armV7 flags if that passes.

So something's broken there :-)

@joolswills, can you run these on your rpi1 to help debug it?

cd /dev/shm/
git clone --depth 1 https://github.com/mesonbuild/wrapdb
cd wrapdb/
meson setup build -Dwraps=speexdsp
meson compile -C build
grep '^#define' build/subprojects/speexdsp-1.2.1/config.h

If you can paste all that output (starting w/ meson setup .. through to the grep output), hopefully it will reveal what's going on.

After that, feel free to blow away /dev/shm/wrapdb!

I think it's probably something like the host (which is an armv8) supports NEON, but the -mcpu target doesn't. Maybe our CPU flags are lost or something (the log suggests an armv7 flag is added which overrides ours). I will test on a rpi1 and rpi4 building for armv6 target and get back to you. I probably won't have time to do that today though.

kcgen commented

the host (which is an armv8) supports NEON

That's definitely it!

  • does the compiler accepts NEON flags? ✔️
  • can the compiler build a NEON test prog? ✔️
  • does the test program run and give expected output: ✔️

It's a GO for NEON :)!

I think the dilemma is that the armv8 build space isn't really running a cross-compiler environment (ie: with only arvm6 compilers, armv6 libs, etc) and instead the compiler is the standard armv8 GCC that supports lots of -mcpu= flags.

The good news is that Meson has amazing support for cross compiler environments, and even extremely exotic ones:

The most complicated case is when you cross-compile a cross compiler. As an example you can, on a Linux machine, generate a cross compiler that runs on Windows but produces binaries on MIPS Linux. In this case build machine is x86 Linux, host machine is x86 Windows and target machine is MIPS Linux. This setup is known as the Canadian Cross.

https://mesonbuild.com/Cross-compilation.html

More good news is that there are armv6 cross-compilers that run on armv8 hosts (that all unpack under a single self-contained directory), here: https://github.com/abhiTronix/raspberry-pi-cross-compilers

I'll poke around with this on my pi4. This approach is the gold-standard; however failing this I think we can pass a flag (in the pi1 case) just to knock out all the SIMD checks in the SpeexDSP build.

kcgen commented

Update: I got partway down this path, for anyone wanting to dig into this:

sudo apt install libgcc-8-dev-armhf-cross libstdc++-8-dev-armhf-cross libatomic1-armhf-cross

Create arm-linux-gnueabihf.txt content:

[binaries]
c = 'arm-linux-gnueabihf-gcc-8'
cpp = 'arm-linux-gnueabihf-g++-8'

ar = 'arm-linux-gnueabihf-gcc-ar-8'
ranlib = 'arm-linux-gnueabihf-gcc-ranlib-8'
nm = 'arm-linux-gnueabihf-gcc-nm-8'

ld = 'arm-linux-gnueabihf-ld.gold'
strip = 'arm-linux-gnueabihf-strip'
pkgconfig = 'arm-linux-gnueabihf-pkg-config'
readelf = 'arm-linux-gnueabihf-readelf'
strings = 'arm-linux-gnueabihf-strings'

[target_machine]
system = 'linux'
cpu_family = 'arm'
cpu = 'armv6zk'
endian = 'little'

Use it:

meson setup --cross-file arm-linux-gnueabihf.txt build

However the cross-compiler still permitted the neon flags and the test executable still ran and passed (because I'm running on an armv8) :(

kcgen commented

Unfortunately the cross-compiler solution is made more difficult because we want to do even more than just cross compile: we want to run the speexdsp test executable on the target platform to verify its floating point output is good.

So I think the best options are either:

  1. Build on the rpi1 itself, or

Disable NEON explicitly like this:

  1. meson setup -Dspeexdsp:simd=false build

We are not actually cross compiling like that as you can build armv6 binaries on an armv8 with the standard GCC. Just by changing for example the -mcpu=

If the CFLAGS are preserved it should work. I will take a look also when I get a chance.

Thanks for looking into this.

meson setup -Dspeexdsp:simd=false

This looks like a good solution. Thanks.

cmitu commented

If the CFLAGS are preserved it should work. I will take a look also when I get a chance.

They are preserved during build, but not when the test for NEON is compiled:

https://github.com/mesonbuild/wrapdb/blob/7202e4cf3931d9af5e66aa67c891a620cb69864e/subprojects/packagefiles/speexdsp/meson.build#L100-L113

kcgen commented

(just mentioning it, but the NEON test is excluded and not run when passing the -Dspeexdsp:simd=false flag, so hopefully your tests work as expected, @joolswills 🚀 )

@cmitu I think we should probably handle it in RetroPie-Setup. It's possible there would be cases that removing the armv7/neon flags would disable neon by default on a neon supporting system.

At least the gcc on Raspberry Pi OS used to default to generating non neon binaries on armv7+ as the same distro was used for the RPI1. I may be out of date with this.

If think if we can stick the logic in RetroPie-Setup instead this would be a simple fix.

I'll have a look.

(just mentioning it, but the NEON test is excluded and not run when passing the -Dspeexdsp:simd=false flag, so hopefully your tests work as expected, @joolswills 🚀 )

Thanks for your help with this.

cmitu commented

It's possible there would be cases that removing the armv7/neon flags would disable neon by default on a neon supporting system.

On RasPI OS it's still like this (it's built with --with-fpu=vfp), without the neon flag the NEON paths are not available

cc -v
Using built-in specs.
COLLECT_GCC=cc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/10/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Raspbian 10.2.1-6+rpi1' --with-bugurl=file:///usr/share/doc/gcc-10/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-10 --program-prefix=arm-linux-gnueabihf- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm --disable-libquadmath --disable-libquadmath-support --enable-plugin --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-sjlj-exceptions --with-arch=armv6 --with-fpu=vfp --with-float=hard --disable-werror --enable-checking=release --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf

$ gcc -dM -E - </dev/null | grep -i neon
# .. crickets
$ gcc -mfpu=neon -dM -E - </dev/null | grep -i neon
#define __ARM_NEON 1
#define __ARM_NEON_FP 4
#define __ARM_NEON__ 1

Easiest check - retroarch doesn't build with --enable-neon without the -mfpu=neon parameter:

./retroarch --enable-neon
Checking operating system ... Linux
Checking for suitable working C compiler ... /usr/bin/gcc works
Checking for suitable working C++ compiler ... /usr/bin/g++ works
...
Checking existence of -lSPIRV-Tools-opt ... yes
Checking existence of -lSPIRV-Tools ... yes
Checking presence of predefined macro __ARM_NEON__ ... no
Build assumed that __ARM_NEON__ is defined, but it's not. Exiting ...