PETSc error in branch devel
Closed this issue · 4 comments
bbsy789 commented
[1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[1]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind
[1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple MacOS to find memory corruption errors
[1]PETSC ERROR: likely location of problem given in stack below
[1]PETSC ERROR: --------------------- Stack Frames ------------------------------------
[1]PETSC ERROR: No error traceback is available, the problem could be in the main program.
[1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[1]PETSC ERROR: Signal received
[1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[1]PETSC ERROR: Petsc Release Version 3.17.2, Jun 02, 2022
[1]PETSC ERROR: /thfs1/home/liujinmei/AsFem/bin/asfem on a named cn537 by liujinmei Wed Oct 26 11:44:02 2022
[1]PETSC ERROR: [2]PETSC ERROR: ------------------------------------------------------------------------
[2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[2]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind
[2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple MacOS to find memory corruption errors
[2]PETSC ERROR: likely location of problem given in stack below
[2]PETSC ERROR: --------------------- Stack Frames ------------------------------------
[2]PETSC ERROR: No error traceback is available, the problem could be in the main program.
[2]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
bbsy789 commented
@yangbai90 I think this problem is about memory.
bbsy789 commented
I modify the CMakeLists.txt.Add some args-fsanitize=undefined,address
and change other args-O2->-Og
,-Werror->none
to CMAKE_CXX_FLAGS.
I use this command find / -name "libasan.so"
to get the location of "libasan.so"
, then
LD_PRELOAD=/usr/lib/gcc/x86_64-linux-gnu/5/libasan.so ./asfem --version. The result following:
liujinmei@ln0:~/AsFem/bin$ LD_PRELOAD=/thfs1/home/liujinmei/software/gcc/gcc12.2/lib64/libasan.so ./asfem --version
Abort(1090831) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(159):
MPID_Init(509).......:
MPIR_pmi_init(91)....: PMIX_Init returned -25
[ln0:800040:0:800040] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid: 800040) ====
0 /usr/local/ucx/lib/libucs.so.0(ucs_debug_print_backtrace+0x1c) [0x40003fd9712c]
1 /usr/local/ucx/lib/libucs.so.0(ucs_handle_error+0x250) [0x40003fd993d0]
2 /usr/local/ucx/lib/libucs.so.0(+0x26530) [0x40003fd99530]
3 /usr/local/ucx/lib/libucs.so.0(+0x268c0) [0x40003fd998c0]
4 linux-vdso.so.1(__kernel_rt_sigreturn+0) [0x40003bf305b8]
5 /thfs1/software/mpich/mpi-x-gcc9.3.0/lib/libmpi.so.12(MPIR_Err_return_comm+0x78) [0x40003c9adaa8]
6 /thfs1/home/liujinmei/software/petsc/3.17.2/lib/libpetsc.so.3.17(PetscInitialize+0x19c) [0x40003d511ebc]
7 ./asfem() [0x40acec]
8 /lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0xe8) [0x40003fa00090]
9 ./asfem() [0x409e84]
=================================
AddressSanitizer:DEADLYSIGNAL
=================================================================
==800040==ERROR: AddressSanitizer: SEGV on unknown address 0x2a47000c3528 (pc 0x40003c9adaa8 bp 0xfffff6bbdf40 sp 0xfffff6bbdf40 T0)
==800040==The signal is caused by a READ memory access.
#0 0x40003c9adaa8 in MPIR_Err_return_comm (/thfs1/software/mpich/mpi-x-gcc9.3.0/lib/libmpi.so.12+0x2e7aa8)
#1 0x40003d511eb8 in PetscInitialize (/thfs1/home/liujinmei/software/petsc/3.17.2/lib/libpetsc.so.3.17+0x1d2eb8)
#2 0x40ace8 in main ../src/main.cpp:24
#3 0x40003fa0008c in __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x2408c)
#4 0x409e80 (/thfs1/home/liujinmei/AsFem/bin/asfem+0x409e80)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/thfs1/software/mpich/mpi-x-gcc9.3.0/lib/libmpi.so.12+0x2e7aa8) in MPIR_Err_return_comm
==800040==ABORTING
bbsy789 commented
cpu:arrch64
system:ubuntu
compiler:gcc12.2
mpi: mpich/mpi-x-gcc9.3.0
petsc:3.17.2
finded this issue.
bbsy789 commented
The issue's reason may be found, in which the openmpi and petsc must be compiled by the same compiler's version.
This issue has been solved. Thanks for your help!