alsora/ros2-tensorflow

Failing with for vision_msg id

ak-nv opened this issue · 3 comments

ak-nv commented

I could install and get all the dependencies:

rosdep install --from-paths ros2-tensorflow --ignore-src --rosdistro eloquent -y
**#All required rosdeps installed successfully**

My tensorflow version is 2.2.0

However when I run server and client; I get following error when I run both

ros2 run tf_detection_py client_test
ros2 run image_tools cam2image --ros-args -p frequency:=2.0 

The error is with server; feeding vision message id.

ros2 run tf_detection_py server
2020-08-06 12:08:55.028256: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-08-06 12:09:24.590181: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-08-06 12:09:24.594491: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-06 12:09:24.594652: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.377GHz coreCount: 8 deviceMemorySize: 15.44GiB deviceMemoryBandwidth: 82.08GiB/s
2020-08-06 12:09:24.594720: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-08-06 12:09:24.597684: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-08-06 12:09:24.625458: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-06 12:09:24.626463: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-06 12:09:24.671231: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-06 12:09:24.696422: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-08-06 12:09:24.697417: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-06 12:09:24.697833: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-06 12:09:24.698115: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-06 12:09:24.698194: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-08-06 12:09:24.712367: W tensorflow/core/platform/profile_utils/cpu_utils.cc:106] Failed to find bogomips or clock in /proc/cpuinfo; cannot determine CPU frequency
2020-08-06 12:09:24.713154: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xb65f4f0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-06 12:09:24.713222: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-08-06 12:09:24.798459: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-06 12:09:24.798879: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xa3f4db0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-08-06 12:09:24.798937: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Xavier, Compute Capability 7.2
2020-08-06 12:09:24.799515: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-06 12:09:24.799640: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.377GHz coreCount: 8 deviceMemorySize: 15.44GiB deviceMemoryBandwidth: 82.08GiB/s
2020-08-06 12:09:24.799698: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-08-06 12:09:24.799831: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-08-06 12:09:24.799910: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-06 12:09:24.799981: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-06 12:09:24.800051: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-06 12:09:24.800118: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-08-06 12:09:24.800186: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-06 12:09:24.800378: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-06 12:09:24.800676: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-06 12:09:24.800747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-08-06 12:09:24.800848: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-08-06 12:09:27.867831: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-06 12:09:27.867933: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 
2020-08-06 12:09:27.867997: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N 
2020-08-06 12:09:27.868734: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-06 12:09:27.869085: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-06 12:09:27.869247: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6495 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
[INFO] [detection_server]: Load model completed!
2020-08-06 12:09:54.023721: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-06 12:09:59.436178: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
[INFO] [detection_server]: Warmup completed! Ready to receive real images!
Traceback (most recent call last):
  File "/mnt/8c3f68c9-a08a-400b-8c80-99c5fee26a06/ros2-tensorflow/install/tf_detection_py/lib/tf_detection_py/server", line 33, in <module>
    sys.exit(load_entry_point('tf-detection-py==0.0.2', 'console_scripts', 'server')())
  File "/mnt/8c3f68c9-a08a-400b-8c80-99c5fee26a06/ros2-tensorflow/install/tf_detection_py/lib/python3.6/site-packages/tf_detection_py/examples/server.py", line 27, in main
    rclpy.spin(node)
  File "/mnt/8c3f68c9-a08a-400b-8c80-99c5fee26a06/ros2_eloquent/install/rclpy/lib/python3.6/site-packages/rclpy/__init__.py", line 190, in spin
    executor.spin_once()
  File "/mnt/8c3f68c9-a08a-400b-8c80-99c5fee26a06/ros2_eloquent/install/rclpy/lib/python3.6/site-packages/rclpy/executors.py", line 684, in spin_once
    raise handler.exception()
  File "/mnt/8c3f68c9-a08a-400b-8c80-99c5fee26a06/ros2_eloquent/install/rclpy/lib/python3.6/site-packages/rclpy/task.py", line 239, in __call__
    self._handler.send(None)
  File "/mnt/8c3f68c9-a08a-400b-8c80-99c5fee26a06/ros2_eloquent/install/rclpy/lib/python3.6/site-packages/rclpy/executors.py", line 404, in handler
    await call_coroutine(entity, arg)
  File "/mnt/8c3f68c9-a08a-400b-8c80-99c5fee26a06/ros2_eloquent/install/rclpy/lib/python3.6/site-packages/rclpy/executors.py", line 358, in _execute_service
    response = await await_or_execute(srv.callback, request, srv.srv_type.Response())
  File "/mnt/8c3f68c9-a08a-400b-8c80-99c5fee26a06/ros2_eloquent/install/rclpy/lib/python3.6/site-packages/rclpy/executors.py", line 118, in await_or_execute
    return callback(*args)
  File "/mnt/8c3f68c9-a08a-400b-8c80-99c5fee26a06/ros2-tensorflow/install/tf_detection_py/lib/python3.6/site-packages/tf_detection_py/detection_node.py", line 200, in handle_image_detection_srv
    response.detections = self.create_detections_msg(image_np, output_dict)
  File "/mnt/8c3f68c9-a08a-400b-8c80-99c5fee26a06/ros2-tensorflow/install/tf_detection_py/lib/python3.6/site-packages/tf_detection_py/detection_node.py", line 172, in create_detections_msg
    detected_object.id = classes[i].item()
  File "/mnt/8c3f68c9-a08a-400b-8c80-99c5fee26a06/ros2-tensorflow/install/vision_msgs/lib/python3.6/site-packages/vision_msgs/msg/_object_hypothesis_with_pose.py", line 138, in id
    "The 'id' field must be of type 'str'"
AssertionError: The 'id' field must be of type 'str'

The cause of the issue is an incompatibility between vision_msgs release and development branches.

You can see that in the development branch the id field is a string https://github.com/Kukanani/vision_msgs/blob/ros2/msg/ObjectHypothesisWithPose.msg#L6
On the other hand, in the release branch, the id field is an int https://github.com/Kukanani/vision_msgs-release/blob/debian/eloquent/vision_msgs/msg/ObjectHypothesisWithPose.msg#L6

From the error and from your log, I can see that you are using the development version, while this package targets the released one.

You have the following options (in order from what I think is the best to the worst)

  1. remove vision_msgs package sources from your workspace, this way the rosdep install command will correctly get the release version.
  2. in your workspace, clone the release version of the vision_msgs package instead of the development one
  3. apply a patch and cast the int to a string in this repo https://github.com/alsora/ros2-tensorflow/blob/master/ros2-tensorflow/tf_detection_py/tf_detection_py/detection_node.py#L172 unfortunately I will not be able to integrate that patch in this project as it would break the compatibility with the released version of vision_msgs
ak-nv commented

Hi @alsora Thank you for quick comments. I tried 1 and 2: Option 2 worked for me.

  1. Did not work still got the same error
  2. I did git clone https://github.com/Kukanani/vision_msgs-release.git -b debian/eloquent/vision_msgs in the src and then do colcon build

Thanks for your inputs.

@ak-nv FYI the issue has been fixed in the vision-msgs repository, so now you can use any version.