PatWie/tensorflow-cmake

undefined reference at cc/inference sample

Babwenbiber opened this issue · 5 comments

Environment

info output
os Ubuntu 16.04
cmake 3.5.1
TF version 1.9.0
TF is working python -c "import tensorflow as tf; sess=tf.InteractiveSession()" works
bazel version 0.21.0
TF from source version(*.so) r1.13
tensorflow-gpu 1.9.0
tensorflow-estimator 1.13.0
Linux FOO 4.15.0-70-generic #79~16.04.1-Ubuntu SMP Tue Nov 12 14:01:10 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

numpy                         1.16.2                
protobuf                      3.7.1                 
tensorflow                    1.9.0                 
tensorflow-estimator          1.13.0                
tensorflow-gpu                1.9.0                 
RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
/home/FOO/.local/lib/python2.7/site-packages/requests/__init__.py:83: RequestsDependencyWarning: Old version of cryptography ([1, 2, 3]) may cause slowdown.
  warnings.warn(warning, RequestsDependencyWarning)
path = /home/FOO/.local/lib/python2.7/site-packages/tensorflow/__init__.pyc
tf.GIT_VERSION = v1.9.0-0-g25c197e023
tf.VERSION = 1.9.0
tf.COMPILER_VERSION = v1.9.0-0-g25c197e023
2020-01-02 13:45:53.258578: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-01-02 13:45:53.368100: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-02 13:45:53.369103: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:01:00.0
totalMemory: 10.91GiB freeMemory: 9.90GiB
2020-01-02 13:45:53.442601: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-02 13:45:53.443560: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 1 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:02:00.0
totalMemory: 10.92GiB freeMemory: 10.77GiB
2020-01-02 13:45:53.444523: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0, 1
2020-01-02 13:45:54.033465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-01-02 13:45:54.033490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0 1 
2020-01-02 13:45:54.033495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N Y 
2020-01-02 13:45:54.033499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 1:   Y N 
2020-01-02 13:45:54.033684: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9572 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-01-02 13:45:54.134601: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10422 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
Sanity check: array([1], dtype=int32)
Thu Jan  2 13:45:54 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:01:00.0  On |                  N/A |
| 23%   23C    P0    60W / 250W |    894MiB / 11176MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
| 23%   22C    P0    62W / 250W |      2MiB / 11178MiB |     72%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0       856      G   ...ps/CLion/ch-0/193.5233.144/jbr/bin/java     3MiB |
|    0      1861      G   /usr/lib/xorg/Xorg                           472MiB |
|    0      3486      G   compiz                                       385MiB |
|    0     14914      G   ...yCharm-P/ch-0/193.5233.109/jbr/bin/java    23MiB |
|    0     18373      G   /usr/lib/firefox/firefox                       2MiB |
|    0     24926      G   ...wnloads/Nextcloud-2.5.2-x86_64.AppImage     3MiB |
+-----------------------------------------------------------------------------+

Issue

I can't build the inference example for cc. The make command fails with a linkage error.

Context:
I cloned this repo and followed the instructions.

Reproduce:

  1. git clone https://github.com/tensorflow/tensorflow/ && cd tensorflow
  2. git checkout r1.13
  3. ./configure (python2.7, cuda version 9.0)
    with following options:
    XLA JIT support: Y
    OPENCL SYCL support: N
    ROCm support: N
    CUDA support: Y (Version 9)
    cuDNN version 7
    TensorRT support: N
    NCCL version: https://github.com/nvidia/nccl
    cuda compute capabilities: 6.1,6.1
    clang as CUDA compiler: N
    MPI support: N
    bazel optimization flags: -march=native -Wno-sign-compare
    WS for Android: N
  4. export TENSORFLOW_SOURCE_DIR and TENSORFLOW_BUILD_DIR
  5. mkdir ${TENSORFLOW_BUILD_DIR}
    6) cp ${TENSORFLOW_SOURCE_DIR}/bazel-bin/tensorflow/*.so ${TENSORFLOW_BUILD_DIR}/
  6. The following command is an instruction of this repo, but will fail, since the subdirectories do not exist yet:
    cp ${TENSORFLOW_SOURCE_DIR}/bazel-genfiles/tensorflow/cc/ops/*.h ${TENSORFLOW_BUILD_DIR}/includes/tensorflow/cc/ops/
    Therefore I did a mkdir -p ${TENSORFLOW_BUILD_DIR}/includes/tensorflow/cc/ops/
    before.
  7. cd inference/cc
  8. mkdir build
  9. cmake .. (Tried it with cmake .. -DPYTHON_EXECUTABLE=python as well)
  10. make

Output

[ 50%] Building CXX object CMakeFiles/inference_cc.dir/inference_cc.cc.o
[100%] Linking CXX executable inference_cc
CMakeFiles/inference_cc.dir/inference_cc.cc.o: In function `LoadModel(tensorflow::Session*, std::string, std::string)':
inference_cc.cc:(.text+0x79): undefined reference to `tensorflow::ReadBinaryProto(tensorflow::Env*, std::string const&, google::protobuf::MessageLite*)'
CMakeFiles/inference_cc.dir/inference_cc.cc.o: In function `main':
inference_cc.cc:(.text+0x18ca): undefined reference to `tensorflow::Tensor::DebugString() const'
inference_cc.cc:(.text+0x1957): undefined reference to `tensorflow::Tensor::DebugString() const'
inference_cc.cc:(.text+0x19e7): undefined reference to `tensorflow::Tensor::DebugString() const'
inference_cc.cc:(.text+0x1a77): undefined reference to `tensorflow::Tensor::DebugString() const'
CMakeFiles/inference_cc.dir/inference_cc.cc.o: In function `tensorflow::TfCheckOpHelper(tensorflow::Status, char const*)':
inference_cc.cc:(.text._ZN10tensorflow15TfCheckOpHelperENS_6StatusEPKc[_ZN10tensorflow15TfCheckOpHelperENS_6StatusEPKc]+0x38): undefined reference to `tensorflow::TfCheckOpHelperOutOfLine(tensorflow::Status const&, char const*)'
CMakeFiles/inference_cc.dir/inference_cc.cc.o: In function `tensorflow::Status::operator==(tensorflow::Status const&) const':
inference_cc.cc:(.text._ZNK10tensorflow6StatuseqERKS0_[_ZNK10tensorflow6StatuseqERKS0_]+0x4d): undefined reference to `tensorflow::Status::ToString() const'
inference_cc.cc:(.text._ZNK10tensorflow6StatuseqERKS0_[_ZNK10tensorflow6StatuseqERKS0_]+0x5e): undefined reference to `tensorflow::Status::ToString() const'
CMakeFiles/inference_cc.dir/inference_cc.cc.o: In function `std::string* tensorflow::internal::MakeCheckOpString<long, int>(long const&, int const&, char const*)':
inference_cc.cc:(.text._ZN10tensorflow8internal17MakeCheckOpStringIliEEPSsRKT_RKT0_PKc[_ZN10tensorflow8internal17MakeCheckOpStringIliEEPSsRKT_RKT0_PKc]+0x75): undefined reference to `tensorflow::internal::CheckOpMessageBuilder::NewString()'
CMakeFiles/inference_cc.dir/inference_cc.cc.o: In function `std::string* tensorflow::internal::MakeCheckOpString<unsigned long, unsigned long>(unsigned long const&, unsigned long const&, char const*)':
inference_cc.cc:(.text._ZN10tensorflow8internal17MakeCheckOpStringImmEEPSsRKT_RKT0_PKc[_ZN10tensorflow8internal17MakeCheckOpStringImmEEPSsRKT_RKT0_PKc]+0x75): undefined reference to `tensorflow::internal::CheckOpMessageBuilder::NewString()'
CMakeFiles/inference_cc.dir/inference_cc.cc.o: In function `std::string* tensorflow::internal::MakeCheckOpString<long long, long long>(long long const&, long long const&, char const*)':
inference_cc.cc:(.text._ZN10tensorflow8internal17MakeCheckOpStringIxxEEPSsRKT_RKT0_PKc[_ZN10tensorflow8internal17MakeCheckOpStringIxxEEPSsRKT_RKT0_PKc]+0x75): undefined reference to `tensorflow::internal::CheckOpMessageBuilder::NewString()'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
CMakeFiles/inference_cc.dir/build.make:96: recipe for target 'inference_cc' failed
make[2]: *** [inference_cc] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/inference_cc.dir/all' failed
make[1]: *** [CMakeFiles/inference_cc.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

Expectation
Successful make build.

Investigation
I tried several linking flags in the cmake file (-ltensorflow, -tensorflow_cc, -ltensorflow_framework) suggested at https://github.com/tensorflow/tensorflow/issues/14632.
I tried this steps described above on two different machines (Both Ubuntu 16.04, one with cuda as shown above, the other without cuda), but both failed with the error shown above.
I tried the Custom Operation guide in the tensorflow-cmake repo as well, but this didn't succeed neither.

Are the env-var LD_LIBRAR_PATH and LIBRARY_PATH correct? They have to contain the libraries of tensorflow (*.so files)

Are the env-var LD_LIBRAR_PATH and LIBRARY_PATH correct? They have to contain the libraries of tensorflow (*.so files)

I tried this, but I get the exact same error message when doing a make.
I did a
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$TENSORFLOW_BUILD_DIR
and
export LIBRARY_PATH=$LIBRARY_PATH:$TENSORFLOW_BUILD_DIR

Minor: Should be export LD_LIBRARY_PATH=$TENSORFLOW_BUILD_DIR:$LD_LIBRARY_PATH and likewise LIBRARY_PATH to make sure the tensorflow libraries are the first which are considered.

I cannot reproduce this issue here.

I updated my post and put detailed information about my bazel configuration. Maybe this helps.

I am a bit confused. You built TensorFlow from source (v1.13) and installed tensorflow pip (v1.9). If this is correct, then there is likely the following issue:

  • FindTensorFlow.cmake (in this git) calls python to ask for the tensorflow library and version

Since you have installed tensorflow-gpu v1.9, python will only see this. But you will need to compile against v1.13 for inference. When compiled TensorFlow from source, you should also be able to compile the pip wheel and install the pip wheel to have a consistent tensorflow version in python and the *.so files.

This would at least explain the linker errors you observed.