Segmentation Fault on MacOS with pytorch > 2.2.0
connortann opened this issue ยท 4 comments
EDIT: relevant issue on pytorch: pytorch/pytorch#121101
The test suite recently began failing on MacOS.
Example failing run:
https://github.com/shap/shap/actions/runs/8021717954/job/21914432162
Fatal Python error: Segmentation fault
Thread 0x000070000ca96000 (most recent call first):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/pool.py", line 579 in _handle_results
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 982 in run
File Fatal Python error: Segmentation fault
"/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1045 in _bootstrap_inner
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1002 in _bootstrap
When the python version is pinned to 3.11.7
, we seemingly get a different error relating to lightgbm:
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ctypes/__init__.py:376: in __init__
self._handle = _dlopen(self._name, mode)
E OSError: dlopen(/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/lightgbm/lib/lib_lightgbm.so, 6): Library not loaded: /usr/local/opt/libomp/lib/libomp.dylib
E Referenced from: /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/lightgbm/lib/lib_lightgbm.so
E Reason: image not found
Related:
This may be unrelated, but on the topic of MacOS issues I noticed there was a failure in the mac-os job for the latest proposed release on conda-forge: conda-forge/shap-feedstock#76
Here are a couple notes:
The last successful run of the macos pipeline on master was this: https://github.com/shap/shap/actions/runs/7972563240/job/21765144274.
I debugged the macos pipeline using https://github.com/mxschmitt/action-tmate and found that the segmentation faults happen in the pytorch tests, in the lines where one calls the model on data, e.g. here. From the successful run I found that we used torch version 2.2.0
there instead of 2.2.1
. Will check if it works if I pin the version.
#3518 fixed the original issue with the tests by pinning pytorch; let's keep this issue open until the full test suite passes with the latest pytorch.
The pytorch issue is documented in: pytorch/pytorch#121101.