Single-image crash in ifx libc
Closed this issue · 5 comments
Steps to reproduce the runtime error:
git clone -b use_ifx https://github.com/berkeleylab/matcha
cd matcha
cp templates/fpm.toml-template ./fpm.toml
export FOR_COARRAY_NUM_IMAGES=1
$ fpm run --compiler ifx --flag "-coarray=shared"
+ mkdir -p build/dependencies
Initialized empty Git repository in /storage/users/rouson/tmp/matcha/build/dependencies/assert/.git/
remote: Enumerating objects: 29, done.
remote: Counting objects: 100% (29/29), done.
remote: Compressing objects: 100% (27/27), done.
remote: Total 29 (delta 0), reused 16 (delta 0), pack-reused 0
Unpacking objects: 100% (29/29), 13.94 KiB | 528.00 KiB/s, done.
From https://github.com/sourceryinstitute/assert
* branch a3065a9dffaedf085fbd262c6bf31b309aa43a4a -> FETCH_HEAD
distribution_m.f90 done.
input_m.f90 done.
assert_m.F90 done.
characterizable_m.f90 done.
data_partition_m.f90 done.
input_s.f90 done.
t_cell_collection_m.f90 done.
assert_s.F90 done.
intrinsic_array_m.F90 done.
data_partition_s.F90 done.
do_concurrent_m.f90 done.
output_m.f90 done.
t_cell_collection_s.F90 done.
intrinsic_array_s.F90 done.
matcha_m.f90 done.
distribution_s.F90 done.
do_concurrent_s.f90 done.
output_s.f90 done.
matcha_s.F90 done.
main.F90 done.
libmatcha.a done.
matcha done.
[100%] Project compiled successfully.
[jupiter:2521117:0:2521117] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x7f6ade4e1498)
==== backtrace (tid:2521117) ====
0 /lib/libucs.so.0(ucs_handle_error+0x2e4) [0x7f6b4c42be74]
1 /lib/libucs.so.0(+0x3008f) [0x7f6b4c42c08f]
2 /lib/libucs.so.0(+0x303c4) [0x7f6b4c42c3c4]
3 /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7f6b5104a420]
4 build/ifx_C4BBCE17D21A365D/app/matcha() [0x40b43f]
5 build/ifx_C4BBCE17D21A365D/app/matcha() [0x40a308]
6 build/ifx_C4BBCE17D21A365D/app/matcha() [0x407469]
7 build/ifx_C4BBCE17D21A365D/app/matcha() [0x40552e]
8 build/ifx_C4BBCE17D21A365D/app/matcha() [0x40527d]
9 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f6b50e68083]
10 build/ifx_C4BBCE17D21A365D/app/matcha() [0x40519e]
=================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 2521117 RUNNING AT jupiter
= KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================
<ERROR> Execution failed for object " matcha "
<ERROR>*cmd_run*:stopping due to failed executions
STOP 1
adding options "-g -O0 -traceback -coarray"
and
export I_MPI_FABRICS=shm
the traceback looks like :
`[100%] Project compiled successfully.
- build/ifx_97488F9229162D1C/app/matcha
forrtl: severe (174): SIGSEGV, segmentation fault occurred
In coarray image 1
Image PC Routine Line Source
libpthread-2.28.s 00007F5B4D8E8C20 Unknown Unknown Unknown
matcha 000000000040E470 do_concurrent_sam 14 do_concurrent_s.f90
matcha 000000000040C0B3 velocities 57 distribution_s.F90
matcha 0000000000407D5F matcha 51 matcha_s.F90
matcha 0000000000405213 main 18 main.F90
matcha 0000000000404DBD Unknown Unknown Unknown
libc-2.28.so 00007F5B4D330493 __libc_start_main Unknown Unknown
matcha 0000000000404CDE Unknown Unknown Unknown
`
I don't have a good idea of the fault so far. investigation underway
Thanks for the update, @greenrongreen !
@greenrongreen does the above traceback mean that the failure is at line 14 in do_concurrent_s.f90? If so, the relevant line is an association with a an expression that includes an intrinsic function that is an important workhorse for this application:
13 do concurrent(cell = 1:ncells, step = 1:nsteps)
14 associate(k => findloc(speeds(cell,step) >= cumulative_distribution, value=.false., dim=1)-1)
15 sampled_speeds(cell,step) = vel(k)
16 end associate
17 end do
If line 14 above is the issue, there are some obvious workarounds with varying degrees of cost and complexity. The simplest workaround might be to eliminate associate
altogether and instead to substitute corresponding expression inside the array index, which yields:
sampled_speeds(cell,step) = vel(findloc(speeds(cell,step) >= cumulative_distribution, value=.false., dim=1)-1)
@Dominick99 tried this and it gets us past this error and on to the next error. I'll ask him to post the new error message in a comment here, but I do hope Intel will fix the issues with using associate
inside do concurrent
! As we've discussed it's a pretty important feature to me.
On a side, it would also be great if Intel could remove or increase the hard limit on nesting associate
blocks. I have occasionally hit the limit, which if if I recall correctly is somewhere around 7 nesting levels.
@greenrongreen after replacing associate statements on lines 14 in do_concurrent_s.f90, 65 in distribution_s.F90, 51 and 57 in matcha_s.F90, and 86 on do_concurrent_s.f90, it appears that I got matcha to run using
./build/run-fpm.sh run --compiler ifx --flag "-g -O0 -traceback -coarray"
@rouson should I push these changes to the branch 'use_ifx'?
@greenrongreen by removing associate
statements, @Dominick99 got our application to compile with ifx
and run with and without the GPU-offloading flags inside a virtual machine on his local machine. However, because he's running inside a virtual machine, I suspect that no actual offloading happening is happening. By contrast, the code crashes when I compile with fix
Version 2023.0.0 Build 20221201 and run on a system at the University of Oregon with quad-socket 24-core Intel CooperLake CPUs (96 cores total) and Intel GPUs. Below are the steps to reproduce the problem:
git clone -b declare-ncells-as-c_int https://github.com/berkeleylab/matcha
cd matcha
fpm run --compiler ifx --flag "-g -O0 -traceback -coarray -fopenmp-target-do-concurrent -fiopenmp -fopenmp-targets=spir64"
which yields
Libomptarget error: Unable to generate entries table for device id 0.
Libomptarget error: Failed to init globals on device 0
Libomptarget error: Run with
Libomptarget error: LIBOMPTARGET_DEBUG=1 to display basic debug information.
Libomptarget error: LIBOMPTARGET_DEBUG=2 to display calls to the compute runtime.
Libomptarget error: LIBOMPTARGET_INFO=4 to dump host-target pointer mappings.
do_concurrent_s.f90:14:14: Libomptarget fatal error 1: failure of target construct while offloading is mandatory
forrtl: error (76): Abort trap signal
In coarray image 1
Image PC Routine Line Source
libpthread-2.31.s 00007F52E9F1B420 Unknown Unknown Unknown
libc-2.31.so 00007F52E9D5200B gsignal Unknown Unknown
libc-2.31.so 00007F52E9D31859 abort Unknown Unknown
libomptarget.so 00007F52EA1B79A4 Unknown Unknown Unknown
libomptarget.so 00007F52EA1B8E02 Unknown Unknown Unknown
libomptarget.so 00007F52EA1B3ADF __tgt_target_kern Unknown Unknown
libomptarget.so 00007F52EA1D0538 __tgt_target_team Unknown Unknown
matcha 000000000041404B do_concurrent_sam 14 do_concurrent_s.f90
matcha 0000000000410F2F velocities 57 distribution_s.F90
matcha 000000000040874E matcha 52 matcha_s.F90
matcha 00000000004058AD main 18 main.F90
matcha 00000000004053FD Unknown Unknown Unknown
libc-2.31.so 00007F52E9D33083 __libc_start_main Unknown Unknown
matcha 000000000040531E Unknown Unknown Unknown
On the same machine, dropping the offload flags works fine: fpm run --compiler ifx --flag "-g -O0 -traceback -coarray"
.