Segmentation fault (core dumped) for CUDA12
danboshuiyan opened this issue · 7 comments
I have only installed the environment for af2_binder_design.yml, and encountered an issue when running predict.py.
Segmentation fault (core dumped)
The environment is as follows:NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2
jax 0.4.23
jaxlib 0.4.23+cuda12.cudnn89
May I ask how I can solve this problem? Thank you very much
I am having the same problem as well. My environment passes the importtest and says it recongizes the GPU, but once I run predict.py I get a segmentation fault core dumped issue.
I have the same issue.
This is the GPU driver version information:
NVIDIA-SMI 535.113.01 Driver Version: 535.113.01 CUDA Version: 12.2
And here is the list of installed packages in the af2_binder_design
conda environment:
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
absl-py 2.0.0 pyhd8ed1ab_0 conda-forge
aiohttp 3.9.1 py311h459d7ec_0 conda-forge
aiosignal 1.3.1 pyhd8ed1ab_0 conda-forge
astunparse 1.6.3 pyhd8ed1ab_0 conda-forge
attrs 23.2.0 pyh71513ae_0 conda-forge
biopython 1.81 pypi_0 pypi
blinker 1.7.0 pyhd8ed1ab_0 conda-forge
brotli-python 1.1.0 py311hb755f60_1 conda-forge
bzip2 1.0.8 hd590300_5 conda-forge
c-ares 1.24.0 hd590300_0 conda-forge
ca-certificates 2023.11.17 hbcca054_0 conda-forge
cached-property 1.5.2 hd8ed1ab_1 conda-forge
cached_property 1.5.2 pyha770c72_1 conda-forge
cachetools 5.3.2 pyhd8ed1ab_0 conda-forge
certifi 2023.11.17 pyhd8ed1ab_0 conda-forge
cffi 1.16.0 py311hb3a22ac_0 conda-forge
charset-normalizer 3.3.2 pyhd8ed1ab_0 conda-forge
chex 0.1.85 pypi_0 pypi
click 8.1.7 unix_pyh707e725_0 conda-forge
contextlib2 21.6.0 pyhd8ed1ab_0 conda-forge
cryptography 41.0.7 py311hcb13ee4_1 conda-forge
cuda-version 11.8 h70ddcb2_2 conda-forge
cudatoolkit 11.8.0 h4ba93d1_12 conda-forge
cudnn 8.8.0.121 hcdd5f01_4 conda-forge
dm-haiku 0.0.11 pypi_0 pypi
dm-tree 0.1.8 pypi_0 pypi
etils 1.6.0 pypi_0 pypi
flatbuffers 23.5.26 h59595ed_1 conda-forge
flax 0.7.5 pypi_0 pypi
frozenlist 1.4.1 py311h459d7ec_0 conda-forge
fsspec 2023.12.2 pypi_0 pypi
gast 0.5.4 pyhd8ed1ab_0 conda-forge
giflib 5.2.1 h0b41bf4_3 conda-forge
google-auth 2.26.0 pyhca7485f_0 conda-forge
google-auth-oauthlib 1.0.0 pyhd8ed1ab_1 conda-forge
google-pasta 0.2.0 pyh8c360ce_0 conda-forge
grpcio 1.54.3 py311hcafe171_0 conda-forge
h5py 3.10.0 nompi_py311hebc2b07_101 conda-forge
hdf5 1.14.3 nompi_h4f84152_100 conda-forge
icu 73.2 h59595ed_0 conda-forge
idna 3.6 pyhd8ed1ab_0 conda-forge
importlib-metadata 7.0.1 pyha770c72_0 conda-forge
importlib-resources 6.1.1 pypi_0 pypi
jax 0.4.23 pypi_0 pypi
jaxlib 0.4.23+cuda12.cudnn89 pypi_0 pypi
jmp 0.0.4 pypi_0 pypi
keras 2.13.1 pyhd8ed1ab_0 conda-forge
keyutils 1.6.1 h166bdaf_0 conda-forge
krb5 1.21.2 h659d440_0 conda-forge
ld_impl_linux-64 2.40 h41732ed_0 conda-forge
libabseil 20230125.3 cxx17_h59595ed_0 conda-forge
libaec 1.1.2 h59595ed_1 conda-forge
libblas 3.9.0 20_linux64_openblas conda-forge
libcblas 3.9.0 20_linux64_openblas conda-forge
libcurl 8.5.0 hca28451_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 hd590300_2 conda-forge
libexpat 2.5.0 hcb278e6_1 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 13.2.0 h807b86a_3 conda-forge
libgfortran-ng 13.2.0 h69a702a_3 conda-forge
libgfortran5 13.2.0 ha4646dd_3 conda-forge
libgomp 13.2.0 h807b86a_3 conda-forge
libgrpc 1.54.3 hb20ce57_0 conda-forge
libjpeg-turbo 3.0.0 hd590300_1 conda-forge
liblapack 3.9.0 20_linux64_openblas conda-forge
libnghttp2 1.58.0 h47da74e_1 conda-forge
libnsl 2.0.1 hd590300_0 conda-forge
libopenblas 0.3.25 pthreads_h413a1c8_0 conda-forge
libpng 1.6.39 h753d276_0 conda-forge
libprotobuf 3.21.12 hfc55251_2 conda-forge
libsqlite 3.44.2 h2797004_0 conda-forge
libssh2 1.11.0 h0841786_0 conda-forge
libstdcxx-ng 13.2.0 h7e041cc_3 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libxcrypt 4.4.36 hd590300_1 conda-forge
libzlib 1.2.13 hd590300_5 conda-forge
markdown 3.5.1 pyhd8ed1ab_0 conda-forge
markdown-it-py 3.0.0 pypi_0 pypi
markupsafe 2.1.3 py311h459d7ec_1 conda-forge
mdurl 0.1.2 pypi_0 pypi
ml-collections 0.1.1 pyhd8ed1ab_0 conda-forge
ml_dtypes 0.3.1 py311h320fe9a_2 conda-forge
mock 5.1.0 pyhd8ed1ab_0 conda-forge
msgpack 1.0.7 pypi_0 pypi
multidict 6.0.4 py311h459d7ec_1 conda-forge
nccl 2.19.4.1 h6103f9b_0 conda-forge
ncurses 6.4 h59595ed_2 conda-forge
nest-asyncio 1.5.8 pypi_0 pypi
numpy 1.26.3 py311h64a7726_0 conda-forge
nvidia-cublas-cu12 12.3.4.1 pypi_0 pypi
nvidia-cuda-cupti-cu12 12.3.101 pypi_0 pypi
nvidia-cuda-nvcc-cu12 12.3.107 pypi_0 pypi
nvidia-cuda-nvrtc-cu12 12.3.107 pypi_0 pypi
nvidia-cuda-runtime-cu12 12.3.101 pypi_0 pypi
nvidia-cudnn-cu12 8.9.7.29 pypi_0 pypi
nvidia-cufft-cu12 11.0.12.1 pypi_0 pypi
nvidia-cusolver-cu12 11.5.4.101 pypi_0 pypi
nvidia-cusparse-cu12 12.2.0.103 pypi_0 pypi
nvidia-nccl-cu12 2.19.3 pypi_0 pypi
nvidia-nvjitlink-cu12 12.3.101 pypi_0 pypi
oauthlib 3.2.2 pyhd8ed1ab_0 conda-forge
openssl 3.2.0 hd590300_1 conda-forge
opt_einsum 3.3.0 pyhc1e730c_2 conda-forge
optax 0.1.7 pypi_0 pypi
orbax-checkpoint 0.4.8 pypi_0 pypi
packaging 23.2 pyhd8ed1ab_0 conda-forge
pip 23.3.2 pyhd8ed1ab_0 conda-forge
protobuf 4.21.12 py311hcafe171_0 conda-forge
pyasn1 0.5.1 pyhd8ed1ab_0 conda-forge
pyasn1-modules 0.3.0 pyhd8ed1ab_0 conda-forge
pycparser 2.21 pyhd8ed1ab_0 conda-forge
pygments 2.17.2 pypi_0 pypi
pyjwt 2.8.0 pyhd8ed1ab_0 conda-forge
pyopenssl 23.3.0 pyhd8ed1ab_0 conda-forge
pyrosetta 2023.49+release.9891f2c py311_0 https://conda.graylab.jhu.edu
pysocks 1.7.1 pyha2e5f31_6 conda-forge
python 3.11.7 hab00c5b_1_cpython conda-forge
python-flatbuffers 23.5.26 pyhd8ed1ab_0 conda-forge
python_abi 3.11 4_cp311 conda-forge
pyu2f 0.1.5 pyhd8ed1ab_0 conda-forge
pyyaml 6.0.1 py311h459d7ec_1 conda-forge
re2 2023.03.02 h8c504da_0 conda-forge
readline 8.2 h8228510_1 conda-forge
requests 2.31.0 pyhd8ed1ab_0 conda-forge
requests-oauthlib 1.3.1 pyhd8ed1ab_0 conda-forge
rich 13.7.0 pypi_0 pypi
rsa 4.9 pyhd8ed1ab_0 conda-forge
scipy 1.11.4 pypi_0 pypi
setuptools 69.0.3 pyhd8ed1ab_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
snappy 1.1.10 h9fff704_0 conda-forge
tabulate 0.9.0 pypi_0 pypi
tensorboard 2.13.0 pyhd8ed1ab_0 conda-forge
tensorboard-data-server 0.7.0 py311h63ff55d_1 conda-forge
tensorflow 2.13.1 cuda118py311h878bca4_1 conda-forge
tensorflow-base 2.13.1 cuda118py311h002e3ce_1 conda-forge
tensorflow-estimator 2.13.1 cuda118py311h4a64c31_1 conda-forge
tensorstore 0.1.51 pypi_0 pypi
termcolor 2.3.0 pyhd8ed1ab_0 conda-forge
tk 8.6.13 noxft_h4845f30_101 conda-forge
toolz 0.12.0 pypi_0 pypi
typing_extensions 4.5.0 pyha770c72_0 conda-forge
tzdata 2023d h0c530f3_0 conda-forge
urllib3 2.1.0 pyhd8ed1ab_0 conda-forge
werkzeug 3.0.1 pyhd8ed1ab_0 conda-forge
wheel 0.42.0 pyhd8ed1ab_0 conda-forge
wrapt 1.16.0 py311h459d7ec_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
yaml 0.2.5 h7f98852_2 conda-forge
yarl 1.9.3 py311h459d7ec_0 conda-forge
zipp 3.17.0 pyhd8ed1ab_0 conda-forge
zlib 1.2.13 hd590300_5 conda-forge
zstd 1.5.5 hfc55251_0 conda-forge
Is there any solution now? Thanks!
Best,
Jianxiang
I'm sorry to hear that you're having environment issues! I would recommend to inspect the core file that is being dumped, those will give you a hint as to where the error is coming from.
My suspicion is that the issue is actually coming from PyRosetta and not from PyTorch. The import tests are not actually testing whether PyRosetta is correctly installed, I will add this testing shortly.
I've added PyRosetta import testing to the tests. Please run this new one and see if the issue is with PyRosetta.
I have tried with the new import testing file and it failed. Here is the error message:
Segmentation fault (core dumped)
However, when I ran the import testing in python separately, the two imports both went through.
#!/usr/bin/env python
# PyRosetta install test
print("/"*200)
print("Testing PyRosetta install. If this script errors before you see a PyRosetta success message then you " + \
"have an issue with your PyRosetta install")
print("/"*200)
from pyrosetta import *
from pyrosetta.rosetta import *
init()
print("/"*70)
print("PyRosetta installation was successful!")
print("/"*70)
print("\n")
Maybe the core dump error was due to python package incompatibility?
Can you check further for us, please?
Best,
Jianxiang
The version of JAX that conda was installing was incompatible with PyRosetta for some reason. I've added an explicit requirement for JAX to be a slightly older version and this fixes the issue.
Thank you very much for the quick fix.
I have pinned biopython
to 1.81 to make it work.
Best,
Jianxiang