ros-visualization/rviz

What does "unknown reason for failure" mean when rviz is trying to display Lidar (/scan) data?

pitosalas opened this issue · 19 comments

ROS1, Ubuntu 22.04, Noetic.

When displaying LaserScan data in Rviz, I get this error:

Screenshot 2023-05-21 at 12 47 38 PM

The tf and tf tree all look reasonable to me. I am not sure what the problem is with the tf that is causing the problem. I suspect it's the transform from /odom to /base_footprint but that looks ok:

Screenshot 2023-05-21 at 12 54 28 PM

Can someone help me figure out what is causing it? I am happy to look at rviz source code if I knew where to look. Or any hint at all? Thanks!

Please provide the full error message. Particularly, I am missing the two involved TF frames.

The error originates from here:

return "Unknown reason for transform failure";

This means that this call doesn't report an error anymore, which is strange:

if (transformHasProblems(frame_id, stamp, error))

Here is some more info: The full message (which is not helpful):

Screenshot 2023-05-21 at 3 06 03 PM

The tf tree in rviz:

Screenshot 2023-05-21 at 3 11 00 PM

Here is the tf tree in rat:

Screenshot 2023-05-21 at 3 20 02 PM

Here is output from tf_echo and tf_monitor:

[real:platform3]~$ cat github_info 
Waiting for transform chain to become available between odom and base_footprint 



RESULTS: for odom to base_footprint
Chain is: odom -> base_footprint
Net delay     avg = 0.00966198: max = 0.0814555

Frames:
Frame: base_footprint published by unknown_publisher Average Delay: 0.0391491 Max Delay: 0.0716842
Node: unknown_publisher 45.1549 Hz, Average Delay: 0.0391491 Max Delay: 0.0716842
Node: unknown_publisher(static) 1e+08 Hz, Average Delay: 0 Max Delay: 0



RESULTS: for odom to base_footprint
Chain is: odom -> base_footprint
Net delay     avg = 0.0170883: max = 0.0821697

Frames:
Frame: base_footprint published by unknown_publisher Average Delay: 0.0380917 Max Delay: 0.0716842
Node: unknown_publisher 44.922 Hz, Average Delay: 0.0380917 Max Delay: 0.0716842
Node: unknown_publisher(static) 1e+08 Hz, Average Delay: 0 Max Delay: 0



RESULTS: for odom to base_footprint
Chain is: odom -> base_footprint
Net delay     avg = 0.0283903: max = 0.112265

Frames:
Frame: base_footprint published by unknown_publisher Average Delay: 0.0439559 Max Delay: 0.0825805
Node: unknown_publisher 43.6603 Hz, Average Delay: 0.0439559 Max Delay: 0.0825805
Node: unknown_publisher(static) 1e+08 Hz, Average Delay: 0 Max Delay: 0



RESULTS: for odom to base_footprint
Chain is: odom -> base_footprint
Net delay     avg = 0.0350159: max = 0.112265

Frames:
Frame: base_footprint published by unknown_publisher Average Delay: 0.0450682 Max Delay: 0.0825805
Node: unknown_publisher 41.5673 Hz, Average Delay: 0.0450682 Max Delay: 0.0825805
Node: unknown_publisher(static) 1e+08 Hz, Average Delay: 0 Max Delay: 0



RESULTS: for odom to base_footprint
Chain is: odom -> base_footprint
Net delay     avg = 0.0387006: max = 0.112265

Frames:
Frame: base_footprint published by unknown_publisher Average Delay: 0.0446184 Max Delay: 0.0825805
Node: unknown_publisher 42.5153 Hz, Average Delay: 0.0446184 Max Delay: 0.0825805
Node: unknown_publisher(static) 1e+08 Hz, Average Delay: 0 Max Delay: 0



RESULTS: for odom to base_footprint
Chain is: odom -> base_footprint
Net delay     avg = 0.0405556: max = 0.112265

Frames:
Frame: base_footprint published by unknown_publisher Average Delay: 0.0430439 Max Delay: 0.0825805
Node: unknown_publisher 42.7204 Hz, Average Delay: 0.0430439 Max Delay: 0.0825805
Node: unknown_publisher(static) 1e+08 Hz, Average Delay: 0 Max Delay: 0



RESULTS: for odom to base_footprint
Chain is: odom -> base_footprint
Net delay     avg = 0.0419136: max = 0.112265

Frames:
Frame: base_footprint published by unknown_publisher Average Delay: 0.0421325 Max Delay: 0.0825805
Node: unknown_publisher 42.0878 Hz, Average Delay: 0.0421325 Max Delay: 0.0825805
Node: unknown_publisher(static) 1e+08 Hz, Average Delay: 0 Max Delay: 0



RESULTS: for odom to base_footprint
Chain is: odom -> base_footprint
Net delay     avg = 0.0463595: max = 0.112265

Frames:
Frame: base_footprint published by unknown_publisher Average Delay: 0.0437379 Max Delay: 0.0825805
Node: unknown_publisher 41.6842 Hz, Average Delay: 0.0437379 Max Delay: 0.0825805
Node: unknown_publisher(static) 1e+08 Hz, Average Delay: 0 Max Delay: 0



RESULTS: for odom to base_footprint
Chain is: odom -> base_footprint
Net delay     avg = 0.0500028: max = 0.112265

Frames:
Frame: base_footprint published by unknown_publisher Average Delay: 0.0441776 Max Delay: 0.0825805
Node: unknown_publisher 42.2377 Hz, Average Delay: 0.0441776 Max Delay: 0.0825805
Node: unknown_publisher(static) 1e+08 Hz, Average Delay: 0 Max Delay: 0



RESULTS: for odom to base_footprint
Chain is: odom -> base_footprint
Net delay     avg = 0.0517838: max = 0.112265

Frames:
Frame: base_footprint published by unknown_publisher Average Delay: 0.0441619 Max Delay: 0.0825805
Node: unknown_publisher 42.216 Hz, Average Delay: 0.0441619 Max Delay: 0.0825805
Node: unknown_publisher(static) 1e+08 Hz, Average Delay: 0 Max Delay: 0



RESULTS: for odom to base_footprint
Chain is: odom -> base_footprint
Net delay     avg = 0.077401: max = 0.284176

Frames:
Frame: base_footprint published by unknown_publisher Average Delay: 0.0441619 Max Delay: 0.0825805
Node: unknown_publisher 42.216 Hz, Average Delay: 0.0441619 Max Delay: 0.0825805
Node: unknown_publisher(static) 1e+08 Hz, Average Delay: 0 Max Delay: 0



RESULTS: for odom to base_footprint
Chain is: odom -> base_footprint
Net delay     avg = 0.138728: max = 0.497438

Frames:
Frame: base_footprint published by unknown_publisher Average Delay: 0.0441619 Max Delay: 0.0825805
Node: unknown_publisher 42.216 Hz, Average Delay: 0.0441619 Max Delay: 0.0825805
Node: unknown_publisher(static) 1e+08 Hz, Average Delay: 0 Max Delay: 0
At time 1684696628.883
- Translation: [0.000, 0.000, 0.000]
- Rotation: in Quaternion [0.000, 0.000, 0.081, 0.997]
            in RPY (radian) [0.000, -0.000, 0.162]
            in RPY (degree) [0.000, -0.000, 9.295]
At time 1684696629.285
- Translation: [0.000, 0.000, 0.000]
- Rotation: in Quaternion [0.000, 0.000, 0.082, 0.997]
            in RPY (radian) [0.000, -0.000, 0.164]
            in RPY (degree) [0.000, -0.000, 9.382]
At time 1684696630.291
- Translation: [0.000, 0.000, 0.000]
- Rotation: in Quaternion [0.000, 0.000, 0.080, 0.997]
            in RPY (radian) [0.000, -0.000, 0.161]
            in RPY (degree) [0.000, -0.000, 9.218]
At time 1684696631.299
- Translation: [0.000, 0.000, 0.000]
- Rotation: in Quaternion [0.000, 0.000, 0.082, 0.997]
            in RPY (radian) [0.000, -0.000, 0.163]
            in RPY (degree) [0.000, -0.000, 9.355]
At time 1684696631.923
- Translation: [0.000, 0.000, 0.000]
- Rotation: in Quaternion [0.000, 0.000, 0.082, 0.997]
            in RPY (radian) [0.000, -0.000, 0.164]
            in RPY (degree) [0.000, -0.000, 9.401]
[real:platform3]~$ 

If the global frame is changed to anything other than odom, then the Lidar display does what it is supposed to do. Which is why I somehow suspect the tf chain between odom and base_scan (which goes through base_footprint).

Screenshot 2023-05-21 at 3 31 09 PM

Could you please record and post a rosbag file including all the topics, i.e. /clock, /tf*, and /scan? 10s should be enough.

Please upload the file at github. Your email didn't comprise an attachment...

In the meanwhile I totally rebuilt the software on my robot. I thought maybe I had corrupted something inadvertently. I also discovered that Noetic is not "officially" supported on Ubuntu 22.04 so I rebuilt the robot on Ubuntu 20.04.

Unfortunately, the problem still appears.

I appreciate your help so far! I have been stumped by this for a week so I hope that you can find a moment to lookl into this.

@rhaschke I am using linorobot.org. My Launch file uses a kalman filter to fuse the raw odom with the imu:

    <node pkg="robot_localization" type="ekf_localization_node" name="ekf_localization"> 
        <remap from="odometry/filtered" to="odom" />
        <rosparam command="load" file="$(find linorobot)/param/ekf/robot_localization.yaml" />
    </node>

If the IMU is not physically oriented correctly, could that somehow lead to the odom -> laser transfor to "failing"? The thinking is that somehow a badly callibrated IMU would lead to a somehow invalid odom tf which would lead to the problem. However I have since determined that the IMU is fine as it is.

Could you please record and post a rosbag file including all the topics, i.e. /clock, /tf*, and /scan? 10s should be enough.

I did, did you see it?

https://github.com/ros-visualization/rviz/files/11530872/2023-05-22-11-30-36.bag.txt

Thanks!

Thanks for providing the bag file. I can reproduce your issue and will look into it right now.

After searching for quite a while into the wrong direction, I found the culprit: The problem is in your LaserScan messages: The time_increment and scan_time values are probably way too large (measured in seconds):

time_increment:  153333.328125
scan_time:  110400000.000000

rviz expects TF lookups to be possible within the whole time range from stamp to stamp + scan_time, which is 110400000 secs = 3,5 years in your case! This makes all your TF lookups fail in the message filter. However, error lookup with transformHasProblems() succeeds because it just considers the message time stamp, resulting in the unknown error warning.

Note, scan_time = time_increment * len(ranges).

Please consider donating as I spent a bunch of time hunting a rviz bug that turned out to be a message issue.

@rhaschke Great catch, thank you thank you! Your diagnosis makes perfect sense.

I've been tracing through the ydlidar ros code that I am using and I can see where it goes wrong. But I dont understand it well enough to fix it. I have posted on their issues list. If you know anything about ydlidar X4 on ROS Noetic I would be interested.

In the meanwhile I will certainly donate. I can either donate 5/month for a year or donate $75 right now. Do you prefer one or the other? And is that an appropriate amount?

I don't know the ydlidar X4, but great that my analysis helps you. I guess the time units are chosen wrongly (e.g. nanoseconds instead of seconds). Feel free to link the source repository here.
You can freely choose the amount and type of donation. Thanks for considering!

The ydldar repo is: https://github.com/EAIBOT/ydlidar.git
I traced through it and there are some details I havent mastered yet but I can see how it goes wrong. Thanks if you take a look!

Thanks for supporting me by a donation. See here for a fixup of ydlidar.