issue about the tf_sampling_compile.sh

Question

issue about the tf_sampling_compile.sh

YamingZ opened this issue 6 years ago · 11 comments

when I compile tf_sampling_so.so file some warning happened:
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
but the tf_sampling_so.so compiled successfully
then I run the command:
./train_val_shapenet.sh -g 0 -x shapenet_x8_2048_fps
error messages in pointcnn_seg_shapenet_x8_2048_fps.txt like that:
Traceback (most recent call last):
File "../train_val_seg.py", line 295, in
main()
File "../train_val_seg.py", line 127, in main
net = model.Net(points_augmented, features_augmented, is_training, setting)
File "/home/whf/ZYM/PointCNN/pointcnn_seg.py", line 11, in init
PointCNN.init(self, points, features, is_training, setting)
File "/home/whf/ZYM/PointCNN/pointcnn.py", line 64, in init
from sampling import tf_sampling
File "/home/whf/ZYM/PointCNN/sampling/tf_sampling.py", line 15, in
sampling_module=tf.load_op_library(os.path.join(BASE_DIR, 'tf_sampling_so.so'))
File "/home/whf/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 58, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename, status)
File "/home/whf/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: /home/whf/ZYM/PointCNN/sampling/tf_sampling_so.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringEv

My environment as follows：
gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.4)
tensorflow 1.6.0

Answer 1 · 2018-10-14T14:26:42.000Z

I meet the same question with you. And I solve it following the blog https://blog.csdn.net/DuinoDu/article/details/71788484?locationNum=9&fps=1.
Maybe you can try change from '-D_GLIBCXX_USE_CXX11_ABI=0' to '-D_GLIBCXX_USE_CXX11_ABI=1' in the g++ complie command.

Answer 2 · 2018-10-15T02:05:17.000Z

@YamingZ Hi

I suggest you update your gcc to 5.0+ version and recompile it. You can also try the method as @latstars said.

Thanks!

Answer 3 · 2018-10-19T01:01:40.000Z

I meet the same question with you. And I solve it following the blog https://blog.csdn.net/DuinoDu/article/details/71788484?locationNum=9&fps=1.
Maybe you can try change from '-D_GLIBCXX_USE_CXX11_ABI=0' to '-D_GLIBCXX_USE_CXX11_ABI=1' in the g++ complie command.

I have tried this approach, but it still doesn't work.

Answer 4 · 2018-10-19T02:34:44.000Z

@YamingZ Did you update your gcc to 5.0+ and use our original .sh file to compile

Answer 5 · 2018-10-19T04:33:24.000Z

@YamingZ Did you update your gcc to 5.0+ and use our original .sh file to compile

Yes,I did, I use the gcc-5.5.0 to compile tf_sampling_so.so

Answer 6 · 2018-10-19T04:40:20.000Z

this is my tf_sampling_compile.sh file
#/bin/bash PYTHON=python3 CUDA_PATH=/usr/local/cuda TF_LIB=$($PYTHON -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())') PYTHON_VERSION=$($PYTHON -c 'import sys; print("%d.%d"%(sys.version_info[0], sys.version_info[1]))') TF_PATH=$($PYTHON -c 'import tensorflow as tf; print(tf.sysconfig.get_include())') $CUDA_PATH/bin/nvcc tf_sampling_g.cu -o tf_sampling_g.cu.o -c -O2 -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC g++ -std=c++11 tf_sampling.cpp tf_sampling_g.cu.o -o tf_sampling_so.so -shared -fPIC -L$TF_LIB -ltensorflow_framework -I $TF_PATH/external/nsync/public/ -I $TF_PATH -I $CUDA_PATH/include -lcudart -L $CUDA_PATH/lib64/ -O2 -D_GLIBCXX_USE_CXX11_ABI=1

Answer 7 · 2018-10-19T04:54:15.000Z

Hi, I ran into this when I built tensorflow from source with Gcc 5+. You have to have compile with the same abi compatibility as your tensorflow install. From:( https://www.tensorflow.org/install/source) “The official TensorFlow packages <https://www.tensorflow.org/install/pip> are built with GCC 4 and use the older ABI. For GCC 5 and later, make your build compatible with the older ABI using: --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0. ABI compatibility ensures that custom ops built against the official TensorFlow package continue to work with the GCC 5 built package.” For my case, I just removed the D_GLIBCXX_USE_CXX11_ABI option when compiling the sampling opp. I hope that helps. Dustin Dorroh

…

On Thu, Oct 18, 2018 at 9:40 PM YamingZ ***@***.***> wrote: this is my tf_sampling_compile.sh file #/bin/bash PYTHON=python3 CUDA_PATH=/usr/local/cuda TF_LIB=$($PYTHON -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())') PYTHON_VERSION=$($PYTHON -c 'import sys; print("%d.%d"%(sys.version_info[0], sys.version_info[1]))') TF_PATH=$($PYTHON -c 'import tensorflow as tf; print(tf.sysconfig.get_include())') $CUDA_PATH/bin/nvcc tf_sampling_g.cu -o tf_sampling_g.cu.o -c -O2 -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC g++ -std=c++11 tf_sampling.cpp tf_sampling_g.cu.o -o tf_sampling_so.so -shared -fPIC -L$TF_LIB -ltensorflow_framework -I $TF_PATH/external/nsync/public/ -I $TF_PATH -I $CUDA_PATH/include -lcudart -L $CUDA_PATH/lib64/ -O2 -D_GLIBCXX_USE_CXX11_ABI=1 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#87 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA9s_6z_UdGQ-475tqj_KsRz5a-LJtGNks5umVe0gaJpZM4XZbpX> .

Answer 8 · 2018-10-19T06:54:56.000Z

@YamingZ
After update gcc to 5.0, please using "-D_GLIBCXX_USE_CXX11_ABI=0" instead of "-D_GLIBCXX_USE_CXX11_ABI=1"

Answer 9 · 2018-10-19T09:40:05.000Z

thanks for your helping ,But I still can't solve this issue，I have tried your suggestions all above。

Answer 10 · 2018-10-31T02:54:18.000Z

@YamingZ
Sorry, I can't replicate your error, so I don't have other idea to solve this problem. You can refer to this link https://github.com/charlesq34/pointnet2 and try compile it again if possible.
Thanks

Answer 11 · 2018-10-31T06:57:18.000Z

thank you,I create a new envirnment ,reinstall TF, CUDA and cuDNN, now I get out of trouble,your model can be trained successfully