LLNL/UEDGE

Segmentation fault in pyexamples/d3dHsm

Closed this issue · 1 comments

On multiple systems, doing python runcase.py in pyexamples/d3dHsm produces a segmentation fault when bbb.exmain() is run at this line. Here are some inconclusive results of debugging it, in case they make more sense to someone else.

Output of script

$ python runcase.py
Forthon edition
 UEDGE $Name: V7_08_03 $
 Wrote file "gridue" with runid:    EFITD    09/07/90      # 66832 ,2384ms

 ***** Grid generation has been completed
  Updating Jacobian, npe =                      1
 iter=    0 fnrm=      9.216531402973144     nfe=      1


 nksol ---  iterm = 1.
            maxnorm(sf*f(u)) .le. ftol, where maxnorm() is
            the maximum norm function.  u is probably an
            approximate root of f.
 Interpolants created; mype =                   -1
 Wrote file "gridue" with runid:    EFITD    09/07/90      # 66832 ,2384ms

 ***** Grid generation has been completed
  Updating Jacobian, npe =                      1
 iter=    0 fnrm=      9.216531400941010     nfe=      1
  Updating Jacobian, npe =                      2
Fatal Python error: Segmentation fault

Current thread 0x000000010d8f75c0 (most recent call first):
  File "runcase.py", line 46 in <module>
[1]    92229 segmentation fault  python runcase.py

I modified the script, so ignore the line numbers in the output. The segmentation fault happens at the line linked in the first paragraph.

lldb output

$ lldb /usr/local/bin/python
(lldb) target create "/usr/local/bin/python"
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 52, in <module>
    import weakref
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/weakref.py", line 14, in <module>
    from _weakref import (
ImportError: cannot import name _remove_dead_weakref
Current executable set to '/usr/local/bin/python' (x86_64).
(lldb) target stop-hook add
Enter your stop hook command(s).  Type 'DONE' to end.
> bt
> disassemble --pc
Stop hook #1 added.
(lldb) run runcase.py
* thread #1, stop reason = signal SIGSTOP
  * frame #0: 0x0000000100005000 dyld`_dyld_start

dyld`_dyld_start:
->  0x100005000 <+0>: popq   %rdi
    0x100005001 <+1>: pushq  $0x0
    0x100005003 <+3>: movq   %rsp, %rbp
    0x100005006 <+6>: andq   $-0x10, %rsp

Process 89747 launched: '/usr/local/bin/python' (x86_64)
* thread #2, stop reason = exec
  * frame #0: 0x0000000100005000 dyld`_dyld_start

dyld`_dyld_start:
->  0x100005000 <+0>: popq   %rdi
    0x100005001 <+1>: pushq  $0x0
    0x100005003 <+3>: movq   %rsp, %rbp
    0x100005006 <+6>: andq   $-0x10, %rsp

Process 89747 stopped
* thread #2, stop reason = exec
    frame #0: 0x0000000100005000 dyld`_dyld_start
dyld`_dyld_start:
->  0x100005000 <+0>: popq   %rdi
    0x100005001 <+1>: pushq  $0x0
    0x100005003 <+3>: movq   %rsp, %rbp
    0x100005006 <+6>: andq   $-0x10, %rsp
Target 0: (Python) stopped.
(lldb) c
Process 89747 resuming
Forthon edition
 UEDGE $Name: V7_08_03 $
 Wrote file "gridue" with runid:    EFITD    09/07/90      # 66832 ,2384ms

 ***** Grid generation has been completed
  Updating Jacobian, npe =                      1
 iter=    0 fnrm=      9.216531402973144     nfe=      1


 nksol ---  iterm = 1.
            maxnorm(sf*f(u)) .le. ftol, where maxnorm() is
            the maximum norm function.  u is probably an
            approximate root of f.
 Interpolants created; mype =                   -1
 Wrote file "gridue" with runid:    EFITD    09/07/90      # 66832 ,2384ms

 ***** Grid generation has been completed
  Updating Jacobian, npe =                      1
 iter=    0 fnrm=      9.216531400941010     nfe=      1
  Updating Jacobian, npe =                      2
* thread #2, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x114b09010)
  * frame #0: 0x000000010d84a7ba uedgeC.so`daxpy_ + 298
    frame #1: 0x000000010d84b6dc uedgeC.so`dgbfa_ + 1036
    frame #2: 0x000000010d84cd3f uedgeC.so`dgbco_ + 383
    frame #3: 0x000000010d6812e3 uedgeC.so`jac_lu_decomp_ + 467
    frame #4: 0x000000010d707909 uedgeC.so`psetnk_ + 2153
    frame #5: 0x000000010d8219ec uedgeC.so`model_ + 1580
    frame #6: 0x000000010d822c9b uedgeC.so`nksol_ + 4091
    frame #7: 0x000000010d75107c uedgeC.so`uedriv_ + 13932
    frame #8: 0x000000010d742685 uedgeC.so`exmain_ + 405
    frame #9: 0x000000010d57d437 uedgeC.so`bbb_exmain + 55
    frame #10: 0x0000000100155f66 Python`PyEval_EvalFrameEx + 19200
    frame #11: 0x000000010015124c Python`PyEval_EvalCodeEx + 1540
    frame #12: 0x0000000100150c42 Python`PyEval_EvalCode + 32
    frame #13: 0x0000000100172633 Python`run_mod + 49
    frame #14: 0x00000001001726da Python`PyRun_FileExFlags + 130
    frame #15: 0x0000000100172259 Python`PyRun_SimpleFileExFlags + 719
    frame #16: 0x0000000100183cd4 Python`Py_Main + 3136
    frame #17: 0x00007fff6d9fa3d5 libdyld.dylib`start + 1
    frame #18: 0x00007fff6d9fa3d5 libdyld.dylib`start + 1

uedgeC.so`daxpy_:
->  0x10d84a7ba <+298>: movsd  0x10(%rcx,%rdx), %xmm2    ; xmm2 = mem[0],zero
    0x10d84a7c0 <+304>: movhpd 0x18(%rsi,%rdx), %xmm0    ; xmm0 = xmm0[0],mem[0]
    0x10d84a7c6 <+310>: movsd  (%rcx,%rdx), %xmm3        ; xmm3 = mem[0],zero
    0x10d84a7cb <+315>: mulpd  %xmm1, %xmm0

Process 89747 stopped
* thread #2, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x114b09010)
    frame #0: 0x000000010d84a7ba uedgeC.so`daxpy_ + 298
uedgeC.so`daxpy_:
->  0x10d84a7ba <+298>: movsd  0x10(%rcx,%rdx), %xmm2    ; xmm2 = mem[0],zero
    0x10d84a7c0 <+304>: movhpd 0x18(%rsi,%rdx), %xmm0    ; xmm0 = xmm0[0],mem[0]
    0x10d84a7c6 <+310>: movsd  (%rcx,%rdx), %xmm3        ; xmm3 = mem[0],zero
    0x10d84a7cb <+315>: mulpd  %xmm1, %xmm0
Target 0: (Python) stopped.

pudb output

Segfault happens after line shown in _Forthon.py.

pudb

Reproducibility

Mac

  • macOS 10.14
  • Apple LLVM version 10.0.1 (clang-1001.0.46.4)
  • GNU Fortran (GCC) 6.3.0

MIT Engaging cluster

  • centOS 7
  • GNU Fortran (GCC) 6.2.0

Red Hat virtual machine

  • RHEL 7.6
  • GNU Fortran (GCC) 4.8.5

Tried this because there is no segfault for Maxim on Red Hat 7.7 with gfortran 4.8.5. The segfault appears on the Red Hat VM I tested, so this bug doesn't seem to depend on the compiler version or operating system. Maybe it depends on certain environment variables being set?

Anaconda python

I've been using regular python and pip --user to install things, but I also tried Anaconda python and encountered the same issue.

Fixed by commit 4dcaefc.