ImportError: undefined symbol: __cudaRegisterFatBinaryEnd

Question

ImportError: undefined symbol: __cudaRegisterFatBinaryEnd

hu-my opened this issue 6 years ago · 11 comments

Hi, I have encounted the following error when I run this command: sh scripts/train_stanford.sh

Traceback (most recent call last):
File "models/train_rels.py", line 24, in
from lib.rel_model_stanford import RelModelStanford as RelModel
File "/home/huminyang/neural-motifs-pytorch/lib/rel_model_stanford.py", line 13, in
from lib.object_detector import filter_det
File "/home/huminyang/neural-motifs-pytorch/lib/object_detector.py", line 11, in
from lib.fpn.nms.functions.nms import apply_nms
File "/home/huminyang/neural-motifs-pytorch/lib/fpn/nms/functions/nms.py", line 4, in
from .._ext import nms
File "/home/huminyang/neural-motifs-pytorch/lib/fpn/nms/_ext/nms/init.py", line 3, in
from ._nms import lib as _lib, ffi as _ffi
ImportError: /home/huminyang/neural-motifs-pytorch/lib/fpn/nms/_ext/nms/_nms.so: undefined symbol: __cudaRegisterFatBinaryEnd

I use pytorch0.3.0 and cuda9.0, do you have any ideas to solve this? Thanks!

Answer 1 · 2019-04-24T14:44:07.000Z

Are you sure cuda 9.0 is installed? Looks like it might be a cuda/pytorch incompatibility issue: open-mmlab/mmdetection#385

…

On Wed, Apr 24, 2019 at 1:28 AM hmyhehe ***@***.***> wrote: Hi, I have encounted the following error when I run this command: sh scripts/train_stanford.sh Traceback (most recent call last): File "models/train_rels.py", line 24, in from lib.rel_model_stanford import RelModelStanford as RelModel File "/home/huminyang/neural-motifs-pytorch/lib/rel_model_stanford.py", line 13, in from lib.object_detector import filter_det File "/home/huminyang/neural-motifs-pytorch/lib/object_detector.py", line 11, in from lib.fpn.nms.functions.nms import apply_nms File "/home/huminyang/neural-motifs-pytorch/lib/fpn/nms/functions/nms.py", line 4, in from .._ext import nms File "/home/huminyang/neural-motifs-pytorch/lib/fpn/nms/_ext/nms/*init*.py", line 3, in from ._nms import lib as _lib, ffi as _ffi ImportError: /home/huminyang/neural-motifs-pytorch/lib/fpn/nms/_ext/nms/_nms.so: undefined symbol: __cudaRegisterFatBinaryEnd I use pytorch0.3.0 and cuda9.0, do you have any ideas to solve this? Thanks! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#65>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKYTR5OTE2W3ZCUP37MLXLPSAK25ANCNFSM4HIBRRQA> .

Answer 2 · 2019-04-27T09:27:39.000Z

@rowanz Thanks for you reply, I tried another machine and I'm sure that I use pytorch0.3.0(complied by cuda 9.0.176) and cuda9.0, like following informations:

torch.version.cuda
'9.0.176'

nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

But there is still the same error: ImportError: /input/neural-motifs-pytorch/lib/fpn/nms/_ext/nms/_nms.so: undefined symbol: __cudaRegisterFatBinaryEnd, I don't know why and how to solve it.

Answer 3 · 2019-04-29T20:03:33.000Z

oh no! I'm not sure how to solve that either, since I haven't ran into that problem before. let me know if you can find a fix :)

Answer 4 · 2019-05-27T18:43:50.000Z

@hmyhehe Hi, I have also encountered the very similar problems.
I solve it in this way:
create a new make.sh under ".../neural-motifs-pytorch/lib/fpn/nms"

'''
#!/usr/bin/env bash
cuda_path=/usr/local/cuda/

cd src/cuda
echo "Compiling stnn kernels by nvcc..."
nvcc -c -o nms.cu.o nms_kernel.cu -x cu -Xcompiler -fPIC -arch=sm_52

cd ../../
python build.py

'''
run the make.sh under the nms directory.
If you have problems on roi_align, roi_pooling....
Do it in the similar way.
Using the Makefile under the root directory of this project often cause the problems. I have tried on different machines(1080ti+cuda9+pytorch0.3 (or 0.31)) and this solution works.
I hope my solution is useful for you two @rowanz .

Answer 5 · 2019-07-16T07:39:02.000Z

Edit lib/fpn/nms/src/cuda/Makefile
/usr/local/cuda/bin/nvcc -c -o nms.cu.o nms_kernel.cu --compiler-options -fPIC -gencode arch=compute_61,code=sm_61
to
/usr/local/cuda-9.2/bin/nvcc -c -o nms.cu.o nms_kernel.cu --compiler-options -fPIC -gencode arch=compute_61,code=sm_61

if your cuda is 9.2

Answer 6 · 2019-08-08T09:43:37.000Z

Thanks @wtliao, your solution works like a charm! I encountered a similar issue on Ubuntu 16.04.6 LTS with CUDA 9.0 and pytorch 0.3.0 when running ./scripts/pretrain_detector.sh:

Traceback (most recent call last):
File "models/train_detector.py", line 6, in <module>
from lib.object_detector import ObjectDetector
File "/neural-motifs/lib/object_detector.py", line 11, in <module>
from lib.fpn.nms.functions.nms import apply_nms
File "/neural-motifs/lib/fpn/nms/functions/nms.py", line 4, in <module>
from .._ext import nms
File /neural-motifs/lib/fpn/nms/_ext/nms/__init__.py", line 3, in <module>
from ._nms import lib as _lib, ffi as _ffi
ImportError: /neural-motifs/lib/fpn/nms/_ext/nms/_nms.so: undefined symbol: __cudaPopCallConfiguration

and

Traceback (most recent call last):
File "models/train_detector.py", line 6, in <module>
from lib.object_detector import ObjectDetector
File "/neural-motifs/lib/object_detector.py", line 15, in <module>
from lib.fpn.roi_align.functions.roi_align import RoIAlignFunction
File "/neural-motifs/lib/fpn/roi_align/functions/roi_align.py", line 7, in <module>
from .._ext import roi_align
File "/neural-motifs/lib/fpn/roi_align/_ext/roi_align/__init__.py", line 3, in <module>
from ._roi_align import lib as _lib, ffi as _ffi
ImportError: /neural-motifs/lib/fpn/roi_align/_ext/roi_align/_roi_align.so: undefined symbol: __cudaPopCallConfiguration

Creating the respective make.sh files (/neural-motifs/lib/fpn/nms/make.sh and /neural-motifs/lib/fpn/roi_align/make.sh) as you explain solved the problem.

Answer 7 · 2019-09-16T07:56:31.000Z

@maximilianmozes
did you make make.sh same as two of these?

Answer 8 · 2019-09-16T10:47:35.000Z

@jungjun9150 yes exactly, adjusted the make.sh file according to neural-motifs/lib/fpn/nms/Makefile and neural-motifs/lib/fpn/roi_align/Makefile, respectively.

Answer 9 · 2019-09-16T15:47:13.000Z

@maximilianmozes
Did you succed to implement in neural motifs? I have a question to that....
:) . If you so, please help!

Answer 10 · 2019-09-16T16:17:58.000Z

@jungjun9150 yes, feel free to reach out via e-mail or so if you have any issues and I'll try to help.

Answer 11 · 2019-09-18T00:11:39.000Z

@maximilianmozes
I sent the mail!