anstmichaels/emopt

MMI_splitter_3D

Opened this issue · 2 comments

I have encountered some of the following mistakes. I don't know how to solve them. I would like to ask the author to take the time to help out of his busy schedule.Best Wishs.
(base) m3enjoy@m3enjoy-virtual-machine:~/emopt/examples/MMI_splitter_3D$ python mmi_1x2_splitter_3D_fdtd.py
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[0]PETSC ERROR: to get more information on the crash.

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 59.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.

I'm also encountering this problem. It appears to not be a memory amount issue, since I tried it on a system with a few hundred GB of RAM and still had no luck. It affects any examples using the fdtd module. @anstmichaels Any thoughts on what might be causing this?
Thanks!
Charles
EDIT:
To update this, I ran Valgrind on it, and it turned up this:
==4146956== Invalid read of size 4
==4146956== at 0x57C3D711: fdtd::FDTD::build_pml() (in /home/charles/.local/lib/python3.8/site-packages/emopt-2020.9.21-py3.8.egg/emopt/FDTD.so)
==4146956== by 0x57FEF9DC: ffi_call_unix64 (in /home/charles/miniconda3/lib/libffi.so.7.1.0)
==4146956== by 0x57FEF066: ffi_call_int (in /home/charles/miniconda3/lib/libffi.so.7.1.0)
==4146956== by 0x57FD7979: _call_function_pointer (callproc.c:871)
==4146956== by 0x57FD7979: _ctypes_callproc.cold.48 (callproc.c:1199)
==4146956== by 0x57FD80DA: PyCFuncPtr_call.cold.49 (_ctypes.c:4201)
==4146956== by 0x24550E: _PyObject_MakeTpCall (call.c:159)
==4146956== by 0x2CDD08: _PyObject_Vectorcall (abstract.h:125)
==4146956== by 0x2CDD08: call_function (ceval.c:4963)
==4146956== by 0x2CDD08: _PyEval_EvalFrameDefault (ceval.c:3469)
==4146956== by 0x292A28: _PyEval_EvalCodeWithName (ceval.c:4298)
==4146956== by 0x293642: _PyFunction_Vectorcall (call.c:435)
==4146956== by 0x2941CA: _PyObject_FastCallDict (call.c:104)
==4146956== by 0x2944AD: _PyObject_Call_Prepend (call.c:887)
==4146956== by 0x2945C9: slot_tp_init (typeobject.c:6755)
==4146956== Address 0xffffffff8bd5d6e4 is not stack'd, malloc'd or (recently) free'd

Oddly I have never run into this issue, and I have run the FDTD solver pretty extensively on CentOS 7, Ubuntu 18.04, and 20.04. If anyone else encounters this issue, please pull master which has @CharlesDove's fixes and give it a try.