foss-for-synopsys-dwc-arc-processors/toolchain

ARC HS6X GCC corrupting store operation addresses when address is a constant whose value exceeds 32 bits

Closed this issue · 11 comments

I’m using the ARC GCC toolchain from https://github.com/foss-for-synopsys-dwc-arc-processors/toolchain/releases/download/arc-2023.09-release/arc_gnu_2023.09_prebuilt_arc64_elf_linux_install.tar.bz2 to compile the following code for ARCv3 (aka. ARC 64) for a HS66 processor.

When I compile the attached code with -O1 or higher optimization levels, the store instruction to the completion address gets over optimized to the point where the address it’s storing to is wrong.

int main() {
  bool match = true;
  for (unsigned row = 0; row < 4; row++) {
    auto row_addr = reinterpret_cast<volatile unsigned*>(row * 4);
    for (unsigned col = 0; col < 2; col++) {
      if (row_addr[col] != row) {
        match = false;
        break;
      }
    }
  }

  auto sram64 = reinterpret_cast<volatile unsigned long*>(0x80'0000'0000UL);
  auto completion_addr = &sram64[512];
  if (match) {
    *completion_addr = 0xC001;
  } else {
    *completion_addr = 0xDEAD;
  }
}

Example compile command:

$ arc64-elf-g++ -mcpu=hs6x                        \
                -O1                               \
                -Wall                             \
                -Werror                           \
                -g                                \
                -c app.cc                         \
                -o app.o                          \
                -fno-strict-aliasing              \
                -fwrapv                           \
                -fno-aggressive-loop-optimizations

(the last 3 options are not needed to reproduce the issue but the GCC bug reporting page asked me to add them).

Here’s the assembly output in the relevant section (used objdump -SD on the generated object file to get this) when compiled with -O0:

  auto sram64 = reinterpret_cast<volatile unsigned long*>(0x80'0000'0000UL);
  80:   5c4f 79c0 0000 0000     bsetl   r0,0@s32,0x27
  88:   1bfd b01f               stl.as  r0,[fp,-3]
  auto completion_addr = &sram64[512];
  8c:   13fd be40               ldl.as  r0,[fp,-3]
  90:   5896 0008               add3l   r0,r0,512
  94:   1bfc b01f               stl.as  r0,[fp,-4]
  if (match) {
  98:   13ff b080               ldb     r0,[fp,-1]
  9c:   780b                    tst_s   r0,r0
  9e:   001a 0001               beq     26      ;b6 <main+0xb6>
    *completion_addr = 0xC001;
  a2:   13fc be40               ldl.as  r0,[fp,-4]
  a6:   590a 0f00 0000 c001     movl    r1,49153@s32
  ae:   1800 0047               stl     r1,[r0,0]
  b2:   0017 0000               b       22      ;c6 <main+0xc6>
  } else {
    *completion_addr = 0xDEAD;
  b6:   13fc be40               ldl.as  r0,[fp,-4]
  ba:   590a 0f00 0000 dead     movl    r1,57005@s32
  c2:   1800 0047               stl     r1,[r0,0]
  }

As you can set, it’s including a bsetl instruction to generate the sram64 address, adding 512 * 8 to it, then doing a stl command with [r0, 0] as the address.

When compiled with -O1 or greater, here’s the relevant assembly:

  auto sram64 = reinterpret_cast<volatile unsigned long*>(0x80'0000'0000UL);
  auto completion_addr = &sram64[512];
  if (match) {
  30:   7b6b                    tst_s   r3,r3
    *completion_addr = 0xC001;
  32:   58ca 0f02 0000 c001     movl.ne r0,49153@s32
  } else {
    *completion_addr = 0xDEAD;
  3a:   58ca 0f01 0000 dead     movl.eq r0,57005@s32
  42:   1e00 7007 0000 1000     stl     r0,[0x1000]

In this case, it’s making a store to the address 0x1000, which is not correct.

If I change the sram64 address to be a 32 bit number instead, the generated code in -O1 and higher is correct. Thus, I suspect that whatever code is responsible for the optimization to combine the base address and offset constants is using 32 bit math incorrectly and, as a result, drops the upper half of the store address.

As requested, here’s the gcc -v -save-temps output logs and file contents:

Using built-in specs.
COLLECT_GCC=external/_main~arc64_none~arc64_gcc_toolchain_linux_x86_64/bin/arc64-elf-gcc
Target: arc64-snps-elf
Configured with: /SCRATCH/arcjenkins2/slaves/us01-odc-custom-arcoss2/workspace/arcoss_verification/arc_gnu_toolchain_verification/build_toolchain-6/crosstool-ng/.build/arc64-snps-elf/src/gcc/configure --build=x86_64-build_pc-linux-gnu --host=x86_64-build_pc-linux-gnu --target=arc64-snps-elf --prefix=/SCRATCH/arcjenkins2/slaves/us01-odc-custom-arcoss2/workspace/arcoss_verification/arc_gnu_toolchain_verification/build_toolchain-6/crosstool-ng/arc64-unknown-elf --exec_prefix=/SCRATCH/arcjenkins2/slaves/us01-odc-custom-arcoss2/workspace/arcoss_verification/arc_gnu_toolchain_verification/build_toolchain-6/crosstool-ng/arc64-unknown-elf --with-local-prefix=/SCRATCH/arcjenkins2/slaves/us01-odc-custom-arcoss2/workspace/arcoss_verification/arc_gnu_toolchain_verification/build_toolchain-6/crosstool-ng/arc64-unknown-elf/arc64-snps-elf --with-headers=/SCRATCH/arcjenkins2/slaves/us01-odc-custom-arcoss2/workspace/arcoss_verification/arc_gnu_toolchain_verification/build_toolchain-6/crosstool-ng/arc64-unknown-elf/arc64-snps-elf/include --with-newlib --enable-threads=no --disable-shared --with-pkgversion='ARCv3 elf toolchain - build 5771' --enable-__cxa_atexit --disable-libgomp --disable-libmudflap --disable-libmpx --disable-libssp --disable-libquadmath --disable-libquadmath-support --disable-libstdcxx-verbose --with-gmp=/SCRATCH/arcjenkins2/slaves/us01-odc-custom-arcoss2/workspace/arcoss_verification/arc_gnu_toolchain_verification/build_toolchain-6/crosstool-ng/.build/arc64-snps-elf/buildtools --with-mpfr=/SCRATCH/arcjenkins2/slaves/us01-odc-custom-arcoss2/workspace/arcoss_verification/arc_gnu_toolchain_verification/build_toolchain-6/crosstool-ng/.build/arc64-snps-elf/buildtools --with-mpc=/SCRATCH/arcjenkins2/slaves/us01-odc-custom-arcoss2/workspace/arcoss_verification/arc_gnu_toolchain_verification/build_toolchain-6/crosstool-ng/.build/arc64-snps-elf/buildtools --with-isl=no --with-cloog=no --enable-lto --disable-nls --disable-tls --enable-multiarch --enable-languages=c,c++
Thread model: single
Supported LTO compression algorithms: zlib zstd
gcc version 13.1.1 20230516 (ARCv3 elf toolchain - build 5771)
COLLECT_GCC_OPTIONS='-mcpu=hs6x' '-O1' '-Wall' '-Werror' '-g' '-c' '-o' 'app.o' '-v' '-save-temps'
 /tmp/bazel-source-roots/0/bin/../libexec/gcc/arc64-snps-elf/13.1.1/cc1plus -E -quiet -v -iprefix /tmp/bazel-source-roots/0/bin/../lib/gcc/arc64-snps-elf/13.1.1/ app.cc -mcpu=hs6x -Wall -Werror -g -fworking-directory -O1 -fpch-preprocess -o app.ii
ignoring duplicate directory "/tmp/bazel-source-roots/0/bin/../lib/gcc/../../lib/gcc/arc64-snps-elf/13.1.1/../../../../arc64-snps-elf/include/c++/13.1.1"
ignoring duplicate directory "/tmp/bazel-source-roots/0/bin/../lib/gcc/../../lib/gcc/arc64-snps-elf/13.1.1/../../../../arc64-snps-elf/include/c++/13.1.1/arc64-snps-elf"
ignoring duplicate directory "/tmp/bazel-source-roots/0/bin/../lib/gcc/../../lib/gcc/arc64-snps-elf/13.1.1/../../../../arc64-snps-elf/include/c++/13.1.1/backward"
ignoring duplicate directory "/tmp/bazel-source-roots/0/bin/../lib/gcc/../../lib/gcc/arc64-snps-elf/13.1.1/include"
ignoring duplicate directory "/tmp/bazel-source-roots/0/bin/../lib/gcc/../../lib/gcc/arc64-snps-elf/13.1.1/include-fixed"
ignoring duplicate directory "/tmp/bazel-source-roots/0/bin/../lib/gcc/../../lib/gcc/arc64-snps-elf/13.1.1/../../../../arc64-snps-elf/sys-include"
ignoring duplicate directory "/tmp/bazel-source-roots/0/bin/../lib/gcc/../../lib/gcc/arc64-snps-elf/13.1.1/../../../../arc64-snps-elf/include"
#include "..." search starts here:
#include <...> search starts here:
 /tmp/bazel-source-roots/0/bin/../lib/gcc/arc64-snps-elf/13.1.1/../../../../arc64-snps-elf/include/c++/13.1.1
 /tmp/bazel-source-roots/0/bin/../lib/gcc/arc64-snps-elf/13.1.1/../../../../arc64-snps-elf/include/c++/13.1.1/arc64-snps-elf
 /tmp/bazel-source-roots/0/bin/../lib/gcc/arc64-snps-elf/13.1.1/../../../../arc64-snps-elf/include/c++/13.1.1/backward
 /tmp/bazel-source-roots/0/bin/../lib/gcc/arc64-snps-elf/13.1.1/include
 /tmp/bazel-source-roots/0/bin/../lib/gcc/arc64-snps-elf/13.1.1/include-fixed
 /tmp/bazel-source-roots/0/bin/../lib/gcc/arc64-snps-elf/13.1.1/../../../../arc64-snps-elf/sys-include
 /tmp/bazel-source-roots/0/bin/../lib/gcc/arc64-snps-elf/13.1.1/../../../../arc64-snps-elf/include
End of search list.
COLLECT_GCC_OPTIONS='-mcpu=hs6x' '-O1' '-Wall' '-Werror' '-g' '-c' '-o' 'app.o' '-v' '-save-temps'
 /tmp/bazel-source-roots/0/bin/../libexec/gcc/arc64-snps-elf/13.1.1/cc1plus -fpreprocessed app.ii -quiet -dumpbase app.cc -dumpbase-ext .cc -mcpu=hs6x -g -O1 -Wall -Werror -version -o app.s
GNU C++17 (ARCv3 elf toolchain - build 5771) version 13.1.1 20230516 (arc64-snps-elf)
        compiled by GNU C version 9.3.1 20200408 (Red Hat 9.3.1-2), GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.1, isl version none
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: ac08013798b630fa198166467dd39e11
COLLECT_GCC_OPTIONS='-mcpu=hs6x' '-O1' '-Wall' '-Werror' '-g' '-c' '-o' 'app.o' '-v' '-save-temps'
 /tmp/bazel-source-roots/0/bin/../lib/gcc/arc64-snps-elf/13.1.1/../../../../arc64-snps-elf/bin/as -v -mcpu=hs6x -o app.o app.s
GNU assembler version 2.40.90 (arc64-snps-elf) using BFD version (ARCv3 elf toolchain - build 5771) 2.40.90.20230703
COMPILER_PATH=/tmp/bazel-source-roots/0/bin/../libexec/gcc/arc64-snps-elf/13.1.1/:/tmp/bazel-source-roots/0/bin/../libexec/gcc/:/tmp/bazel-source-roots/0/bin/../lib/gcc/arc64-snps-elf/13.1.1/../../../../arc64-snps-elf/bin/
LIBRARY_PATH=/tmp/bazel-source-roots/0/bin/../lib/gcc/arc64-snps-elf/13.1.1/:/tmp/bazel-source-roots/0/bin/../lib/gcc/:/tmp/bazel-source-roots/0/bin/../lib/gcc/arc64-snps-elf/13.1.1/../../../../arc64-snps-elf/lib/
COLLECT_GCC_OPTIONS='-mcpu=hs6x' '-O1' '-Wall' '-Werror' '-g' '-c' '-o' 'app.o' '-v' '-save-temps' '-dumpdir' 'app.'
# 0 "app.cc"
# 1 "/home/marklh/.cache/bazel/_bazel_marklh/1691c390fc7d37e259959f08cf818c5d/sandbox/linux-sandbox/961/execroot/_main//"
# 0 "<built-in>"
# 0 "<command-line>"
# 1 "app.cc"
int main() {
  bool match = true;
  for (unsigned row = 0; row < 4; row++) {
    auto row_addr = reinterpret_cast<volatile unsigned*>(row * 4);
    for (unsigned col = 0; col < 2; col++) {
      if (row_addr[col] != row) {
        match = false;
        break;
      }
    }
  }

  auto sram64 = reinterpret_cast<volatile unsigned long*>(0x80'0000'0000UL);
  auto completion_addr = &sram64[512];
  if (match) {
    *completion_addr = 0xC001;
  } else {
    *completion_addr = 0xDEAD;
  }
}

If we separate the compilation steps into cc1plus and as, we see that:

$ cat app.cc
  int main()
  {
    volatile bool match = true;
    auto sram64 = reinterpret_cast<volatile unsigned long*>(0x80'0000'0000UL);
    auto completion_addr = &sram64[512];
    if (match)
    {
      *completion_addr = 0xC001;
    }
    else
    {
      *completion_addr = 0xDEAD;
    }

    return 0;
  }

$ /path/to/libexec/gcc/arc64-elf/13.1.1/cc1plus -mcpu=hs6x -quiet -O1 app.cc -o app.s
$ cat app.s
  ...
  extb.f  0,r0
  movl.ne r0,49153
  movl.eq r0,57005
  stl     r0,[549755817984]         # <---- 0x80_0000_1000
  mov_s   r0,0
  ...

$ arc64-elf-as -mcpu=hs6x app.s -o app.o
$ arc64-elf-objdump -d app.o
  ...
   c:   262f f007               extb.f  0,r0
  10:   58ca 0f02 0000 c001     movl.ne r0,49153@s32
  18:   58ca 0f01 0000 dead     movl.eq r0,57005@s32
  20:   1e00 7007 0000 1000     stl     r0,[0x1000]
  28:   700c                    mov_s   r0,0
  ...

cc1plus puts the whole address in one ago. It must handle it with better care. The arc64_movdi needs to be examined for this.

#(insn 21 20 28 2 (set (mem/v:DI (const_int 549755817984 [0x8000001000]) [2  S8 A64])
#        (reg:DI 0 r0 [112])) "app.cc":12:22 16 {*arc64_movdi}
#     (expr_list:REG_DEAD (reg:DI 0 r0 [112])
#        (nil)))
	stl	r0,[549755817984]	# 21	[c=4 l=8]  *arc64_movdi/13

The fix arc64: Saner acceptance of immediates by arc64_legitimate_address_p() is pushed onto arc64 branch.

Recording the commit message here:

Before this change, cc1(plus) could emit "store" insturctions such as:

stl 0x1234, [0x80_0000_1000]

which would become "stl 0x1234, [0x1000]" by the "as"sembler.

This happened, because "arc64_legitimate_address_1_p()" was returning
"true" for any sort of "const_int"s.  With this change, it returns
"true" only if the number can fit in 32-bit, and "false" otherwise.

The address preparation as performed by "arc64_prepare_move_operands()"
during the "movdi expand" has already taken care of the rest:

(define_predicate "splittable_const_int_operand"
  (match_code "const_int")
{
  ...
  /* Check if the constant can be loaded in a single bsetl/bclrl insn. */
  if ((SINGLE_BIT_MASK_OPERAND (zext_hwi (INTVAL (op) >> 32, 32))
       && UNSIGNED_INT32 (zext_hwi (INTVAL (op), 32)))
      || (SINGLE_BIT_MASK_OPERAND (zext_hwi ((~INTVAL (op)) >> 32, 32))
          && (sext_hwi (INTVAL (op), 32) < 0)))
    return false;
  ...
}

Because of this logic, there's no need for a split and a mere
"bsetl r0, 0x1000, 39" will get the job done.

@claziss once again, thanks for your support.

@marklh0 Thank you for your report! To use this fix, you have to either wait for the next release or (re)build gcc. If you opt for the latter, here's the patch or you can use the arc64 branch.

Thank you very much for the quick fix! When will the next release be ready? If it's more than a week away I think I'll try my hand at building gcc

@marklh0 we plan to release new version of ARC GNU tools later this year, but definitely it will not be in coming weeks. Thus please try to build the toolchian yourself. Please follow instructions from https://github.com/foss-for-synopsys-dwc-arc-processors/toolchain/blob/arc-releases/README.md.

You'll need to use Crosstool-NG from https://github.com/foss-for-synopsys-dwc-arc-processors/crosstool-ng/tree/arc-2023.09 and snps-arc64-unknown-elf sample for configuration. To use the fix suggested by @shahab-vahedi you need to set CT_GCC_DEVEL_REVISION (https://github.com/foss-for-synopsys-dwc-arc-processors/crosstool-ng/blob/arc-2023.09/samples/snps-arc64-unknown-elf/arc64-unknown-elf-defconfig#L28C1-L28C22) pointing to the fix' commit hash 905facf0fe503f01f1ca8a4e6cf801fdcdd37296.

Let me know if it works for you or more help is needed.

It's definitely weeks away if not months. You can look into this arc-gnu-toolchain to build the tools manually. As an example, this could be your starting point for building the baremetal tool:

$ cd /src

$ git clone https://github.com/foss-for-synopsys-dwc-arc-processors/arc-gnu-toolchain
$ git clone --depth 10 --single-branch --branch arc-2023.09 https://github.com/foss-for-synopsys-dwc-arc-processors/binutils-gdb
$ git clone --depth 10 --single-branch --branch arc-2023.09 https://github.com/foss-for-synopsys-dwc-arc-processors/gcc
$ git clone --depth 10 --single-branch --branch arc-2023.09 https://github.com/foss-for-synopsys-dwc-arc-processors/newlib

$ cd ./gcc
$ git am fix.patch
$ cd ..

$ cd ./arc-gnu-toolchain
$ autoconf
$ ln -s ../binutils-gdb
$ ln -s ../gcc
$ ln -s ../newlib
$ cd ..

$ cd /bld/arc64_toolchain
$ /src/arc-gnu-toolchain/configure   \
      --target=arc64                 \
      --prefix=/inst/arc64_toolchain \
      --enable-debug-info            \
      --disable-werror

$ make -j $(nproc)

I'll keep debugging this on my own, but hopefully one of you might know the answer. I get this when trying to build using @abrodkin 's instructions:

[INFO ]  Installing core C gcc compiler
[EXTRA]    Configuring core C gcc compiler
[EXTRA]    Building gcc
[EXTRA]    Installing gcc
[EXTRA]    Housekeeping for core gcc compiler
[EXTRA]       '' --> lib (gcc)   lib (os)
[EXTRA]       ' -mfpu=fpus' --> lib/fpus (gcc)   lib/fpus (os)
[EXTRA]       ' -mfpu=fpud' --> lib/fpud (gcc)   lib/fpud (os)
[EXTRA]       ' -m128' --> lib/m128 (gcc)   lib/m128 (os)
[EXTRA]       ' -mcpu=hs5x' --> lib/hs5x (gcc)   lib/hs5x (os)
[EXTRA]       ' -mcpu=hs58' --> lib/hs58 (gcc)   lib/hs58 (os)
[INFO ]  Installing core C gcc compiler: done in 102.46s (at 06:18)
[INFO ]  =================================================================
[INFO ]  Installing C library
[EXTRA]    Configuring C library
[EXTRA]    Building C library
[ERROR]    configure: error: bad value no for newlib-global-atexit option
[ERROR]    checking host system type... make[2]: *** [Makefile:8448: configure-target-newlib] Error 1
[ERROR]    make[2]: *** Waiting for unfinished jobs....
[ERROR]    make[1]: *** [Makefile:879: all] Error 2
[ERROR]
[ERROR]  >>
[ERROR]  >>  Build failed in step 'Installing C library'
[ERROR]  >>        called in step '(top-level)'
[ERROR]  >>
[ERROR]  >>  Error happened in: CT_DoExecLog[scripts/functions@377]
[ERROR]  >>        called from: newlib_main[scripts/build/libc/newlib.sh@113]
[ERROR]  >>        called from: do_libc_main[scripts/build/libc.sh@33]
[ERROR]  >>        called from: main[scripts/crosstool-NG.sh@697]
[ERROR]  >>
[ERROR]  >>  For more info on this error, look at the file: 'build.log'
[ERROR]  >>  There is a list of known issues, some with workarounds, in:
[ERROR]  >>      https://crosstool-ng.github.io/docs/known-issues/
[ERROR]  >>
[ERROR]  >> NOTE: Your configuration includes features marked EXPERIMENTAL.
[ERROR]  >> Before submitting a bug report, try to reproduce it without enabling
[ERROR]  >> any experimental features. Otherwise, you'll need to debug it
[ERROR]  >> and present an explanation why it is a bug in crosstool-NG - or
[ERROR]  >> preferably, a fix.
[ERROR]  >>
[ERROR]  >>  If you feel this is a bug in crosstool-NG, report it at:
[ERROR]  >>      https://github.com/crosstool-ng/crosstool-ng/issues/
[ERROR]  >>
[ERROR]  >>  Make sure your report includes all the information pertinent to this issue.
[ERROR]  >>  Read the bug reporting guidelines here:
[ERROR]  >>      http://crosstool-ng.github.io/support/
[ERROR]
[ERROR]  (elapsed: 6:33.23)
gmake: *** [ct-ng:261: build] Error 2

I got it to build, just had to change the bad setting in menuconfig. Trying the toolchain out today and will report back

The fix and build worked, thank you both very much!

I got it to build, just had to change the bad setting in menuconfig. Trying the toolchain out today and will report back

Just out of curiosity, what was that "bad setting" in the menuconfig?

In the C library section, LIBC_NEWLIB_GLOBAL_ATEXIT needed to be changed to true