eclipse-cyclonedds/cyclonedds-cxx

C++ DataReader throws exceptions when receiving samples with negative source_timestamp

matthew-ivester opened this issue · 2 comments

We've been investigating an issue where cyclonedds-cxx code would suddenly throw this exception:

============================Inflight Exception Info=============================
terminate called after throwing an instance of 'dds::core::Error'
  what():  dds::core::Time::sec out of bounds
===============================================================================
Context     : void dds::core::Time::sec
Node        : UnknownNode

This is being thrown from https://github.com/eclipse-cyclonedds/cyclonedds-cxx/blob/master/src/ddscxx/src/dds/core/Time.cpp#L54 :

void dds::core::Time::sec(int64_t s)
{
    if(s < 0 && s != -1) {
        ISOCPP_THROW_EXCEPTION(ISOCPP_ERROR, "dds::core::Time::sec out of bounds");
    } else {
        /// @internal @bug OSPL-2308 RTF Time-ish coercion issue
        /// @see http://jira.prismtech.com:8080/browse/OSPL-2308
        sec_ =  s;
    }
}

where s is ending up with a negative value during the construction of a dds::core::Time object related to an incoming sample.

We traced the issue back to another device on our network that was sending samples from a non-Cyclone DDS implementation and having issues with its NTP time synchronization. (Cyclone DDS itself has checks that prevent sending such samples.)
Apparently our device is sometimes getting an invalid time correction that temporarily jumps its clock way into the future, beyond the https://en.wikipedia.org/wiki/Year_2038_problem date threshold. This causes the DDS timestamp field in the samples it is sending to wrap around, and Cyclone then interprets that as a large negative source timestamp.

The RTPS specification for the timestamp looks like:

DDS SampleInfo Timestamp Layout from spec: 

    struct Time_t {
        long seconds; // time in seconds
        unsigned long fraction; // time in sec/2^32
    };

    time = seconds + (fraction / 2^(32))

    Timestamp:
    0...2...........8...............16.............24...............31
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | long seconds |
    +---------------+---------------+---------------+---------------+
    | unsigned long fraction |
    +---------------+---------------+---------------+---------------+

The output and analysis of some tests using timestamp values sourced from wireshark:

def test_5():
    pkt_num =31259 # Aurix to HPC 
    epoch_arrival = 1723241576.690234000 
    dds_source_ts_hex = "6894b666a5a9a5b0"
    private_data_hex_time = "240dcb78152fea17"
    print_packet_info(pkt_num, epoch_arrival, dds_source_ts_hex, private_data_hex_time)


def test_6():
    pkt_num =31267 # Aurix to HPC 
    epoch_arrival = 1723241576.700394000
    dds_source_ts_hex = "b84541b28eb01141"
    private_data_hex_time = "ad9bd9e8fcd28029"
    print_packet_info(pkt_num, epoch_arrival, dds_source_ts_hex, private_data_hex_time)

output:

pkt: 31259 arrived: 2024-08-09 22:12:56.690234+00:00

App/IDL TimePoint::privateData timestamp: 
datetime utc: 2024-08-09 22:12:56.690028+00:00
frac_sec_since_epoch: 1723241576.6900277
sec since epoch: 1723241576  0x66b69468

calculate time according to RTPS spec: 
sec part: 1723241576 from 6894b666
fraction part: 2963646885 from a5a9a5b0
time_assume_spec_frac: 1723241576.6900277
datetime utc: 2024-08-09 22:12:56.690028+00:00
sec since epoch: 1723241576  0x66b69468
ns_since_epoch: 1723241576690027776  0x17ea2f1578cb0d00
time parts:  sec_part: 1723241576  0x66b69468  :  frac_part: 6900277  0x694a35

pkt: 31267 arrived: 2024-08-09 22:12:56.700394+00:00

App/IDL TimePoint::privateData timestamp: 
datetime utc: 2064-10-07 16:22:16.254176+00:00
frac_sec_since_epoch: 2990622136.254176
sec since epoch: 2990622136  0xb24145b8

calculate time according to RTPS spec: 
sec part: -1304345160 from b84541b2
fraction part: 1091678350 from 8eb01141
time_assume_spec_frac: -1304345159.7458239
datetime utc: 1928-09-01 09:54:00.254176+00:00
sec since epoch: -1304345159  -0x4dbeba47
ns_since_epoch: -1304345159745823744  -0x1219f70317266400
time parts:  sec_part: -1304345160  -0x4dbeba48  :  frac_part: 7458239  0x71cdbf

from our engineer's analysis:

The important part here is that for packet 31267, we get sec part: -1304345160 from b84541b2
Which is not legal and, upon being received at cyclone side, cause the exception we see as dds::core::Time::sec out of bounds

IF you were to treat b84541b2 as an unsigned long, you would get this result:

calculate time according to RTPS spec: 
treating sec part as unsigned long - this is not to RTPS spec! 
sec part: 2990622136 from b84541b2
fraction part: 1091678350 from 8eb01141
time_assume_spec_frac: 2990622136.254176
datetime utc: 2064-10-07 16:22:16.254176+00:00
sec since epoch: 2990622136  0xb24145b8
ns_since_epoch: 2990622136254176256  0x2980d2fce8d99c00
time parts:  sec_part: 2990622136  0xb24145b8  :  frac_part: 254176  0x3e0e0

Which is what wire-shark shows and has a final result matching that of the timestamp contained in the IDL TimePoint::privateData This however is not a correct decode, as the sec should be treated as signed according to spec, and cyclone confirms that sec part's range independently, hence the sec out of bounds exception

While the generation of the bad samples is happening elsewhere, Cyclone should really be able to tolerate corrupted/invalid timestamps on incoming messages without throwing exceptions. Throwing an exception seems fine on attempting to send samples with an invalid/wrapped time value, but blindly trusting anything coming off the network seems problematic.

(edit: fixed code block formatting)

It looks like it is entirely trivial to fix this because the core already handles receiving data with negative timestamps. Can you give #503 a try?