stereolabs/zed-ros-wrapper

decreasing grab framerate to 30 fps -> unstable multi GMSL cameras start

bmegli opened this issue · 3 comments

bmegli commented

Preliminary Checks

  • This issue is not a duplicate. Before opening a new issue, please search existing issues.
  • This issue is not a question, feature request, or anything other than a bug report directly related to this project.

Description

Related to GMSL cameras (ZED-X, ZED-XM)

grab_frame_rate: 60 # 120, 60, 30

Decreasing grab_frame_rate to 30 fps makes starting multiple GMSL cameras at the same time unstable.

Steps to Reproduce

See Anything Else section for now

Expected Result

Changing grab_frame_rate not affecting cameras startup stability.

Actual Result

Decreasing grab_frame_rate to 30 fps makes starting multiple GMSL cameras at the same time unstable.

Starting only 1 camera works as expected.

Waiting for first camera to finish init before second camera helps a bit but is still hit or miss

When both cameras start then they work reliably, it is only the start that is affected.

Warnings

Camera 1 (eventually succeeds)

[ZED-Argus][Timeout] CAM 0 is frozen
[ZED-Argus][Timeout] CAM 0 is frozen
(Argus) Error FileOperationFailed: Failed socket read: Connection reset by peer (in src/rpc/socket/common/SocketUtils.cpp, function readSocket(), line 79)
(Argus) Error FileOperationFailed: Unexpected error in reading socket (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 277)
(Argus) Error FileOperationFailed: Receive worker failure, notifying 1 waiting threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 350)
(Argus) Error InvalidState: Argus client is exiting with 1 outstanding client threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 366)
(Argus) Error FileOperationFailed: Receiving thread terminated with error (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadWrapper(), line 379)
(Argus) Error FileOperationFailed: Client thread received an error from socket (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 145)
(Argus) Error FileOperationFailed:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)

Camera 2 (eventually fails)

(Argus) Error EndOfFile: Unexpected error in reading socket (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 277)
(Argus) Error EndOfFile: Receiving thread terminated with error (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadWrapper(), line 379)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)

ZED Camera model

ZED-X, ZED-XM


Note - impossible to select ZED-X/ZED-XM, in the form, edited afterwards


Environment

Both are running with Neural depth

Jetson AGX Orin

  • JP 5.1
  • L4T 35.2.1
  • ZED_SDK_Tegra_L4T35.2_v4.0.2
  • GMSL driver
apt-cache policy stereolabs-nvidia-l4t-kernel-35.2-dtbs
stereolabs-nvidia-l4t-kernel-35.2-dtbs:
  Installed: 5.10.104-tegra-35.2.1-20230124153320
  Candidate: 5.10.104-tegra-35.2.1-20230124153320

Resulting from

sudo apt install /usr/local/zed/drivers/L4T_35.2/stereolabs-zedx-L4T35.2-v0.4.7_max96712.deb 

Anything else?

Workaround

Keep grab_frame_rate at 60 fps

Other notes

  • I am using nodelet workflow
  • I am instantly pulling data from cameras within launchfiles (traffic on GMSL link from the start)
  • my guess would be there is some timer and timeout tuned for 60 fps grab frame rate

ZED_Depth_Viewer

It is possible to trigger similar condition with 2x ZED_Depth_Viewer + point at different GMSL cameras + neural depth + playing with framerate (which restarts the cameras)

So the real problem is below ROS layer.

First

ZED_Depth_Viewer 
[ZED-Argus][Timeout] CAM 1 is frozen
[ZED-Argus][Timeout] CAM 1 is frozen
(Argus) Error FileOperationFailed: Failed socket read: Connection reset by peer (in src/rpc/socket/common/SocketUtils.cpp, function readSocket(), line 79)
(Argus) Error FileOperationFailed: Unexpected error in reading socket (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 277)
(Argus) Error FileOperationFailed: Receive worker failure, notifying 1 waiting threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 350)
(Argus) Error InvalidState: Argus client is exiting with 1 outstanding client threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 366)
(Argus) Error FileOperationFailed: Receiving thread terminated with error (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadWrapper(), line 379)
(Argus) Error FileOperationFailed: Client thread received an error from socket (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 145)
(Argus) Error FileOperationFailed:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
[ZED-Argus][Timeout] CAM 1 is frozen
[ZED-Argus][Timeout] CAM 1 is frozen

Second

ZED_Depth_Viewer 
(Argus) Error EndOfFile: Unexpected error in reading socket (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 277)
(Argus) Error EndOfFile: Receiving thread terminated with error (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadWrapper(), line 379)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
Stack trace (most recent call last):
#26   Object "ZED_Depth_Viewer", at 0x41ed6f, in 
#25   Object "/usr/lib/aarch64-linux-gnu/libc.so.6", at 0xffffa021ae0f, in __libc_start_main
#24   Object "ZED_Depth_Viewer", at 0x41e2fb, in 
#23   Object "/usr/lib/aarch64-linux-gnu/libQt5Core.so.5", at 0xffffa0893a5b, in QCoreApplication::exec()
#22   Object "/usr/lib/aarch64-linux-gnu/libQt5Core.so.5", at 0xffffa088b3b7, in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>)
#21   Object "/usr/lib/aarch64-linux-gnu/libQt5Core.so.5", at 0xffffa08e81cb, in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>)
#20   Object "/usr/lib/aarch64-linux-gnu/libglib-2.0.so.0", at 0xffff9ed36c53, in g_main_context_iteration
#19   Object "/usr/lib/aarch64-linux-gnu/libglib-2.0.so.0", at 0xffff9ed36bb3, in 
#18   Object "/usr/lib/aarch64-linux-gnu/libglib-2.0.so.0", at 0xffff9ed36943, in g_main_context_dispatch
#17   Object "/usr/lib/aarch64-linux-gnu/libQt5Core.so.5", at 0xffffa08e7e37, in 
#16   Object "/usr/lib/aarch64-linux-gnu/libQt5Core.so.5", at 0xffffa08e7507, in QTimerInfoList::activateTimers()
#15   Object "/usr/lib/aarch64-linux-gnu/libQt5Core.so.5", at 0xffffa088cc0b, in QCoreApplication::notifyInternal2(QObject*, QEvent*)
#14   Object "/usr/lib/aarch64-linux-gnu/libQt5Widgets.so.5", at 0xffffa1245ad7, in QApplication::notify(QObject*, QEvent*)
#13   Object "/usr/lib/aarch64-linux-gnu/libQt5Widgets.so.5", at 0xffffa123c4ab, in QApplicationPrivate::notify_helper(QObject*, QEvent*)
#12   Object "/usr/lib/aarch64-linux-gnu/libQt5Core.so.5", at 0xffffa08ba5b7, in QObject::event(QEvent*)
#11   Object "/usr/lib/aarch64-linux-gnu/libQt5Core.so.5", at 0xffffa08c7557, in QTimer::timeout(QTimer::QPrivateSignal)
#10   Object "/usr/lib/aarch64-linux-gnu/libQt5Core.so.5", at 0xffffa08b9bff, in QMetaObject::activate(QObject*, int, int, void**)
#9    Object "ZED_Depth_Viewer", at 0x41f35b, in 
#8    Object "ZED_Depth_Viewer", at 0x43fe83, in 
#7    Object "ZED_Depth_Viewer", at 0x43fafb, in 
#6    Object "ZED_Depth_Viewer", at 0x437a03, in 
#5    Object "/usr/local/zed/lib/libsl_zed.so", at 0xffffa355f857, in sl::Camera::open(sl::InitParameters)
#4    Object "/usr/local/zed/lib/libsl_zed.so", at 0xffffa35cc56b, in 
#3    Object "/usr/local/zed/lib/libsl_zed.so", at 0xffffa2a47a53, in sl::GMSLInput::close(bool)
#2    Object "/usr/local/zed/lib/libsl_zed.so", at 0xffffa2a3d573, in ArgusCamera::close()
#1    Object "/usr/local/zed/lib/libsl_zed.so", at 0xffffa2a454bf, in 
#0    Object "/usr/lib/aarch64-linux-gnu/tegra/libnvargus_socketclient.so", at 0xffff9fad0810, in 
Segmentation fault (Address not mapped to object [(nil)])
Segmentation fault (core dumped)
bmegli commented

After installing ZED SDK 4.0.3 I can no longer reproduce this problem from ROS side.

I am not sure it is SDK or GMSL grabber driver or something else that fixed the problem.

If I don't see it again soon I will close the issue.

Myzhar commented

When one of the cameras is not reachable it is possible that the argus service is frozen for some reason.
You can recover the cameras by restarting the service:
sudo service nvargus-daemon restart

bmegli commented

Thanks.

I can no longer reproduce the problem also with ZED_Depth_Viewer

So somehow installing ZED SDK 4.0.3 or GMSL driver fixed it