dawonn/vectornav

Two different methods implemented to fix screwy time stamps

Closed this issue · 5 comments

I think there are two method implemented now to solve the same issue:

  1. #82 Fixes the latency from the serial driver through some low level serial driver calls, which assures that messages are reaching the callback without delay and thus the header time will be closer to the actually measurement time
  2. #91 Uses an exponential moving average of the difference between ROS system and sensor provided time to adjust the ROS header time

@berndpfrommer: 1) Was only implemented in the ROS1 driver, that is probably why you only saw the time stamp issue in the ROS2 driver. I just added that before the holidays, so it is a relative new change.

I think 1) should be implemented in the ROS2 driver as well. @berndpfrommer if you like to do that, it should probably be easy to derive it based on #82. I don't use ROS2 and thus have no way of testing it.

I am not sure if 2) is actually necessary or just adds additional complexity. I personally would be very interested to see if it improves the actually time stamps (there is certainly some wiggle room in the ~1ms range). Not sure how to test this, but if the system has GPS we could try to compare the message GPS time with the host clock. If it does improve the actual time stamp, we might just want to have it on by default?

Either way I think it would be good to document this issue and the two solutions, maybe in the README. The parameter adjust_ros_timestamp introduced in 2) could also use a bit documentation. Happy to do that if you guys think there is value here.

Great, this clarifies why I didn't see the ROS header timestamp issues in ROS1.
OK, I will try to make some time to port your serial port fix to ROS2.

Fixing the serial issues is obviously the first step since it improves time stamps and real time performance.
Having that said there is a case to be made to have both fixes in place.

If you have a careful look at the graph with the time stamp deltas for ROS1 you can see that there are fewer and smaller outliers with the adjusted time stamps. This was on a beefy laptop with not much else running. With increasing CPU load on the host, the unadjusted time stamps' quality will deteriorate, but the adjusted ones' should remain (up to a certain load) constant. Note that VIO is very sensitive to errors in the time deltas, since s = 1/2 a t^2, so getting delta_t right is crucial for correct integration of the acceleration. The adjusted ROS timestamps will get their delta_t almost completely from the vectornav's hardware time stamps.

I defaulted adjust_ros_timestamp to false because I did not want to alter the default behavior of the driver, potentially upsetting a fairly large user base. I feel more comfortable for ROS2 because the user base is smaller and without the serial port fix the time stamps are frankly quite bad.

Agree on the documentation. Would that go into the README file or are you thinking of separate (markdown) doc files? Right now the documentation is pretty sparse. For ROS2 there is virtually no description of the many parameters. Their meaning can be deduced from the VectorNav docs but it took me quite a while to get them right.

@berndpfrommer Glad this clarifies a few things. I agree both fixes have their merits and I actually didn't appreciate until you mentioned it that adding the serial fix should improves the timestamp algorithm quite a bit because the deltas are much smaller to start with.

I was a bit confused about the two bands in your plots, shouldn't there just be one of them? I agree the scattering is less with the adjustment. I was working on this as well to make it more suitable for SLAM and appreciate any possible gain in timing precision.

Keeping the default for adjust_ros_timestamp as false makes sense as well.

I would just add a short section about the adjust_ros_timestamp parameter to the readme. While some of the other parameters probably could also use some documentation, I would not want to put that on you 😃 .

I actually just tried myself to understand the adjustment algorithm and I think you introduced a few bugs when moving it to ROS1:

static ros::Time get_time_stamp(
    vn::sensors::CompositeData &cd, UserData *user_data, const ros::Time &ros_time) {
    if (!cd.hasTimeStartup() || !user_data->adjust_ros_timestamp) {
        return (ros_time); // don't adjust timestamp
    }
    const uint64_t sensor_time = cd.timeStartup() * 1e-9; // time in seconds   <--- THIS LINE CROPS sensor_time TO THE CLOSEST SECOND, SHOULD PROBABLY BE A double
    if (user_data->average_time_difference == 0) { // first call
        user_data->ros_start_time = ros_time;
        user_data->average_time_difference = static_cast<double>(sensor_time);
    }
    // difference between node startup and current ROS time
    const double ros_dt = ros_time.toSec() - user_data->ros_start_time.toSec();  <-- NOT WRONG BUT BETTER TO USE (ros_time - user_data->ros_start_time).toSec()
    // difference between elapsed ROS time and time since sensor startup
    const double dt = ros_dt - sensor_time;
    // compute exponential moving average
    const double alpha = 0.001; // average over rougly 1000 samples
    user_data->average_time_difference =
        user_data->average_time_difference * (1.0 - alpha) + alpha * dt;

    // adjust sensor time by average difference to ROS time
    const ros::Time adj_time = user_data->ros_start_time + ros::Duration(dt + sensor_time);  <-- dt SHOULD PROBABLY BE user_data->average_time_difference INSTEAD
    return (adj_time);
}

I actually wonder if you get better precision if you use

ros_start_time = (time_at_first_callback + time_before_starting_callback_loop) * 0.5

This is obviously a poor approximation, but otherwise there is always a unknown offset between the sensor_time and ros_time, or am I missing something here?

Also do you understand at what point the timeStartup is started?

Sorry for the many questions, but I am quite interested in understanding time synchronization algorithm better, this is not the first time I encountered an issue like that.

What a blatant bug! And I didn't catch it in testing because a second bug:

const ros::Time adj_time = user_data->ros_start_time + ros::Duration(dt + sensor_time);  <-- dt SHOULD PROBABLY BE user_data->average_time_difference INSTEAD

masked the first bug (somewhat) such that the data looked sensible. In fact the end result is just the original ROS time stamp as it was before. There goes my statement about the experiment showing that the adjustment produces better timestamp. The "improvement" was all noise.
Fixing it and collecting more data, hopefully correct this time.

@cosama sorry forgot to answer your questions. Maybe some of them resolved themselves now that the bug is fixed.
There are several ways one can adjust the time stamps. Here are the two simplest:

  1. The simplest is at startup: look at the sensor time, look at the host clock, compute the difference, and from now on assume that offset stays the same for all times. No complicated averaging needed. The problem is that for many sensors the sensor time on the first callback is actually quite off, and the difference (sensor time - host time) can be vastly different from subsequent callbacks. So the time stamp adjustment can be off by a fixed offset, let's say 500ms, so the ROS time stamps will be off by 500ms forever. Not a big deal in many cases since often algorithms just use delta_t, but it is confusing to say the least when ROS sensor time and ROS recording time are shifted. Another problem is clock drift: if the sensor clock runs faster or slower than the host clock then ROS header stamp will drift away from ROS wall clock time unboundedly.
  2. Both of these problems are greatly mitigated by maintaining a moving average of ROS wall time - sensor time, and then use sensor time + avg to arrive at header stamp time. The downside is that if (e.g due to high CPU load) the average gets messed up, then the delta_t will temporarily suffer. In my experience though this is much less of an issue than the problems seen with approach 1). And yes, another downside is slightly higher CPU load due to maintaining the time difference. One could cut that down a bit though by just sampling every Nth time stamp to update the average...

Some more documentation wouldn't hurt I feel, but this has been addressed now and there is always this issue if people need to dig deeper. I am closing this. @berndpfrommer Thanks for working on this.