rowanz/neural-motifs

ImportError: undefined symbol: __cudaRegisterFatBinaryEnd

hu-my opened this issue · 11 comments

hu-my commented

Hi, I have encounted the following error when I run this command: sh scripts/train_stanford.sh

Traceback (most recent call last):
File "models/train_rels.py", line 24, in
from lib.rel_model_stanford import RelModelStanford as RelModel
File "/home/huminyang/neural-motifs-pytorch/lib/rel_model_stanford.py", line 13, in
from lib.object_detector import filter_det
File "/home/huminyang/neural-motifs-pytorch/lib/object_detector.py", line 11, in
from lib.fpn.nms.functions.nms import apply_nms
File "/home/huminyang/neural-motifs-pytorch/lib/fpn/nms/functions/nms.py", line 4, in
from .._ext import nms
File "/home/huminyang/neural-motifs-pytorch/lib/fpn/nms/_ext/nms/init.py", line 3, in
from ._nms import lib as _lib, ffi as _ffi
ImportError: /home/huminyang/neural-motifs-pytorch/lib/fpn/nms/_ext/nms/_nms.so: undefined symbol: __cudaRegisterFatBinaryEnd

I use pytorch0.3.0 and cuda9.0, do you have any ideas to solve this? Thanks!

hu-my commented

@rowanz Thanks for you reply, I tried another machine and I'm sure that I use pytorch0.3.0(complied by cuda 9.0.176) and cuda9.0, like following informations:

torch.version.cuda
'9.0.176'

nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

But there is still the same error: ImportError: /input/neural-motifs-pytorch/lib/fpn/nms/_ext/nms/_nms.so: undefined symbol: __cudaRegisterFatBinaryEnd, I don't know why and how to solve it.

oh no! I'm not sure how to solve that either, since I haven't ran into that problem before. let me know if you can find a fix :)

@hmyhehe Hi, I have also encountered the very similar problems.
I solve it in this way:
create a new make.sh under ".../neural-motifs-pytorch/lib/fpn/nms"

'''
#!/usr/bin/env bash
cuda_path=/usr/local/cuda/

cd src/cuda
echo "Compiling stnn kernels by nvcc..."
nvcc -c -o nms.cu.o nms_kernel.cu -x cu -Xcompiler -fPIC -arch=sm_52

cd ../../
python build.py

'''
run the make.sh under the nms directory.
If you have problems on roi_align, roi_pooling....
Do it in the similar way.
Using the Makefile under the root directory of this project often cause the problems. I have tried on different machines(1080ti+cuda9+pytorch0.3 (or 0.31)) and this solution works.
I hope my solution is useful for you two @rowanz .

Edit lib/fpn/nms/src/cuda/Makefile
/usr/local/cuda/bin/nvcc -c -o nms.cu.o nms_kernel.cu --compiler-options -fPIC -gencode arch=compute_61,code=sm_61
to
/usr/local/cuda-9.2/bin/nvcc -c -o nms.cu.o nms_kernel.cu --compiler-options -fPIC -gencode arch=compute_61,code=sm_61

if your cuda is 9.2

Thanks @wtliao, your solution works like a charm! I encountered a similar issue on Ubuntu 16.04.6 LTS with CUDA 9.0 and pytorch 0.3.0 when running ./scripts/pretrain_detector.sh:

Traceback (most recent call last):
File "models/train_detector.py", line 6, in <module>
from lib.object_detector import ObjectDetector
File "/neural-motifs/lib/object_detector.py", line 11, in <module>
from lib.fpn.nms.functions.nms import apply_nms
File "/neural-motifs/lib/fpn/nms/functions/nms.py", line 4, in <module>
from .._ext import nms
File /neural-motifs/lib/fpn/nms/_ext/nms/__init__.py", line 3, in <module>
from ._nms import lib as _lib, ffi as _ffi
ImportError: /neural-motifs/lib/fpn/nms/_ext/nms/_nms.so: undefined symbol: __cudaPopCallConfiguration

and

Traceback (most recent call last):
File "models/train_detector.py", line 6, in <module>
from lib.object_detector import ObjectDetector
File "/neural-motifs/lib/object_detector.py", line 15, in <module>
from lib.fpn.roi_align.functions.roi_align import RoIAlignFunction
File "/neural-motifs/lib/fpn/roi_align/functions/roi_align.py", line 7, in <module>
from .._ext import roi_align
File "/neural-motifs/lib/fpn/roi_align/_ext/roi_align/__init__.py", line 3, in <module>
from ._roi_align import lib as _lib, ffi as _ffi
ImportError: /neural-motifs/lib/fpn/roi_align/_ext/roi_align/_roi_align.so: undefined symbol: __cudaPopCallConfiguration

Creating the respective make.sh files (/neural-motifs/lib/fpn/nms/make.sh and /neural-motifs/lib/fpn/roi_align/make.sh) as you explain solved the problem.

@maximilianmozes
did you make make.sh same as two of these?

@jungjun9150 yes exactly, adjusted the make.sh file according to neural-motifs/lib/fpn/nms/Makefile and neural-motifs/lib/fpn/roi_align/Makefile, respectively.

@maximilianmozes
Did you succed to implement in neural motifs? I have a question to that....
:) . If you so, please help!

@jungjun9150 yes, feel free to reach out via e-mail or so if you have any issues and I'll try to help.

@maximilianmozes
I sent the mail!