Backtrace on macOS
Closed this issue · 12 comments
Describe the issue
When a fatal error occurs, the backtrace gives uninformative info like:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x1458c3b5d
#1 0x1458c2f8d
#2 0x7ff809469dfc
./rn: line 6: 1323 Segmentation fault: 11 ./binary
It would be useful to know what the call sequence was from #0
to #2
to narrow down where the problem lies.
Apparently this info is provided when compiling with the linux compilers, but not on macOS.
I realize this is not directly MESA related, but it is MESASDK related, with we require as part of using MESA.
I'm unfortunately not familiar enough with the compiler/makefiles etc. to find compiler flags that might solve this issue.
System information
- MESA version: r15140
- MESA SDK version: Mac OS 22.6.2
- Operating system: Mac OS 12.5.1
Output from ./help
MESA Version
15140
uname -a
Darwin tejat 21.6.0 Darwin Kernel Version 21.6.0: Wed Aug 10 14:25:27 PDT 2022; root:xnu-8020.141.5~2/RELEASE_X86_64 x86_64
gfortran -v
Using built-in specs.
COLLECT_GCC=/Applications/mesasdk_22.6.2/bin/gfortran.exec
COLLECT_LTO_WRAPPER=/Applications/mesasdk_22.6.2/bin/../libexec/gcc/x86_64-apple-darwin14/12.1.0/lto-wrapper
Target: x86_64-apple-darwin14
Configured with: /opt/sdk2-tmp/build/gcc/configure CC=clang CXX=clang++ --build=x86_64-apple-darwin14 --host=x86_64-apple-darwin14 --target=x86_64-apple-darwin14 --prefix=/opt/sdk2-tmp/mesasdk --with-gmp=/opt/sdk2-tmp/mesasdk --with-mpfr=/opt/sdk2-tmp/mesasdk --with-mpc=/opt/sdk2-tmp/mesasdk --enable-languages=c,c++,fortran --disable-multilib --disable-nls --disable-libsanitizer --with-sysroot=/opt/sdk2-tmp/mesasdk/sysroot --without-build-config
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 12.1.0 (GCC)
$MESASDK_ROOT
/Applications/mesasdk_22.6.2
MESASDK version
x86_64-macos-22.6.2
$PATH
/Applications/mesasdk_22.6.2/bin:/usr/local/opt/grep/libexec/gnubin:/Applications/mesasdk/bin:/Users/matthiasf/opt/miniconda3/bin:/Users/matthiasf/opt/miniconda3/condabin:/usr/local/opt/grep/libexec/gnubin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/usr/local/munki:/opt/X11/bin:/Library/Apple/usr/bin:/Users/matthiasf/opt/bin:/Applications/Julia-1.7.app/Contents/Resources/julia/bin:/Users/matthiasf/opt/bin:/Applications/Julia-1.7.app/Contents/Resources/julia/bin
$MESA_DIR
/Users/matthiasf/software/mesa-r15140
Hi,
Unfortunately its a known problem we have with Mac's in that they give useless backtraces (its not an sdk issues but a MacOS problem).
Your best bet at this point is to try running it under gdb (a version of which is shipped with the sdk)
gdb ./binary -ex r -ex bt
Should run and give a more useful backtrace when the program crashes. You then type quit and then y to exit gdb
Thanks Rob for the quick suggestion.
However, when running as advertised:
GNU gdb (GDB) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin14".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./binary...
Starting program: /Users/matthiasf/mesaRuns/energy_transfer/8.4_shrink_test/0.8_ET/binary
[New Thread 0x1e03 of process 3231]
[New Thread 0x2503 of process 3231]
This just hangs (no CPU usage) and doesn't return the gdb
prompt. Killing is the only option. Reducing threads with export OMP_NUM_THREADS=1
does not help.
Dam Mac's.
If you want to send me your inlists I can give them ago on a linux machine and see what I find.
That’s very kind Rob, but no need. I have since found the source of the segfault through some good ol’ write(*, *)
debugging.
The general issue still remains of course, but it seems this goes beyond MESA. If you say this is a macOS problem, does apple know about this? I presume this is Xcode related then so could I go and submit bug reports there? Has anyone already tried taking this up with apple?
the desired backtrace info is not printed to the terminal, but is avaliable.
save this minimal working example code in a file named segfault1.f90 :
program segfault1
implicit none
real, dimension(10) :: a
integer :: i
a = 0.0
do i = 1, 12
a(i) = i
print*,a(i)
end do
end program segfault1
compile and link (i'm using the latest mesasdk for apple m1) with
% gfortran -g -o segfault1.exe segfault1.f90
ignore any warnings - that's the point here. run the executable:
% ./segfault1.exe
one should see the usual undecipherable "Backtrace for this error:" in the terminal output.
open Console.app on the mac.
in the left menu choose "Crash Reports".
one should see a crash report named "segfault1.exe" with an appropriate date.
clicking on this field shows the crash report in the viewer.
one can read about the info contained in a crash report here:
https://developer.apple.com/documentation/xcode/examining-the-fields-in-a-crash-report
we are mainly interested in the Backtrace section of the crash report.
one may find something like
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libgfortran.5.dylib 0x1013422d4 get_float_string + 116
1 libgfortran.5.dylib 0x101344038 _gfortrani_write_real + 216
2 libgfortran.5.dylib 0x101344038 _gfortrani_write_real + 216
3 libgfortran.5.dylib 0x1013443b0 list_formatted_write_scalar + 736
4 libgfortran.5.dylib 0x101345054 _gfortrani_list_formatted_write + 116
5 segfault1.exe 0x100e7beac MAIN__ + 164 (segfault1.f90:8)
6 segfault1.exe 0x100e7bf08 main + 48 (segfault1.f90:10)
7 dyld 0x100fd108c start + 520
which gives the desired information.
fxt
Thank you for this suggestion Frank, I did not know Console.app did this!
However, on my machine (x86 intel here), the very same program produces the gigantic list below, with no reference to segfault
(probably since 512 = 2^9 lines were printed). So in the end, apart from knowing it is something about writing a real, I do not know where in my source code to look.
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libgfortran.5.dylib 0x10c36e300 get_float_string + 528
1 libgfortran.5.dylib 0x10c3706fa _gfortrani_write_real + 122
<snip>
511 libgfortran.5.dylib 0x10c3706fa _gfortrani_write_real + 122
Thread 0 crashed with X86 Thread State (64-bit):
rax: 0x0000000000000010 rbx: 0x00007ff7b4231840 rcx: 0x0000000000000004 rdx: 0x00007ff8b92320fc
rdi: 0x00007ff7b4231ef0 rsi: 0x00007ff7b4231840 rbp: 0x00007ff7b4231ef0 rsp: 0x00007ff7b4231670
r8: 0x0000000000000001 r9: 0x00007ff7b4231880 r10: 0x00000000ffffff00 r11: 0x00007ff84aba7bc0
r12: 0x0000000000000001 r13: 0x000000000000000a r14: 0x0000000000000004 r15: 0x000000000000001d
rip: 0x000000010c36e300 rfl: 0x0000000000000246 cr2: 0x000000010be94430
i'm using os 12.5.1, xcode Version 13.4.1, gcc version 12.1.0 on m1 and intel machines.
an m1 laptop with mesasdk-x86_64-macos-20.1.1.pkg confirms what i posted above from my m1 studio.
an intel desktop with mesasdk-x86_64-macos-22.6.2.pkg finds something different.
i agree the Console.app crash report does not contain the same info, but now the terminal does emit
#7 0x10627de85 in segfault1
at /Users/fxt/Desktop/segfault1.f90:8
#8 0x10627deda in main
at /Users/fxt/Desktop/segfault1.f90:10
Segmentation fault
which may be sufficient for present purposes.
i don't currently know why different chipsets yield different backtrace behaviors.
I can reproduce the behavior for segfault1
, so at least we have that in common.
On a ./mk
compiled dummy project though, where I do in the default binary work directory:
...
implicit none
real(dp), pointer :: test
contains
subroutine extras_binary_controls(binary_id, ierr)
integer :: binary_id
integer, intent(out) :: ierr
type (binary_info), pointer :: b
ierr = 0
print*, test ! <- will segfault
call binary_ptr(binary_id, b, ierr)
if (ierr /= 0) then
write(*,*) 'failed in binary_ptr'
return
end if
...
I get the trace:
DATE: 2022-09-08
TIME: 10:20:59
read inlist_project
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x1436b2c1e
#1 0x1436b1ddd
#2 0x7ff809469dfc
#3 0x143902181
#4 0x143904751
#5 0x143904a4d
#6 0x1439058d5
#7 0x10b78af5c
#8 0x10b7b711c
#9 0x10b7b94c8
#10 0x10b78b0a0
#11 0x10b78b0bf
#12 0x10beda267
./rn: line 6: 30819 Segmentation fault: 11 ./binary
DATE: 2022-09-08
TIME: 10:20:59
This is compiler flag related then?
I note that with the option -Og
, segfault1
runs just fine.
Just seen this thread. FYI, if gdb hangs when you start a program, quit out (Ctrl-C) and re-run. I don't know what causes this, but gdb on MacOS has for the past few years been a bit of a mess.
Can we close this? Or maybe a mac user could add something to https://github.com/MESAHub/mesa/blob/main/docs/source/developing/debugging.rst about how to debug on a Mac?
i don't see any mac-specific nuances in this guide.
fxt
It seems indeed this is not in our hands, i'll close this for now