Incorrect handling of last timeout (causes retransmit; should not)
Closed this issue · 7 comments
I've noticed that I get timeouts sometimes:
Response with unknown request ID from x.x.x.x. Consider increasing timeouts (30112 ms old?).
Looking at the traffic using tcpdump I can see that the original packet did timeout, and there is a re-transmit carried out after 30s. The answer, however, is discarded with the above error message.
Looking at the code it seems that the request id containing the timestamp of the original packet is not updated for the re-transmit, which means that the packet is in effect timed out before it is even re-transmitted.
Not changing the request ID is correct - it's the same requests we're sending again, and a delayed response to the first packet should satisfy the request in the retry as well. I'm not sure how you end up with a packet being both retransmitted and removed due to timeout at the same time, what are your timeout intervals set to?
It seems the problem was that I had set the timeout to [ 30000 ], meaning no retries, however, what I see is:
0s: First get-packet is sent
--No response received--
30s: A second get packet is sent, although there should be no retries
30.1s: A response arrives this time, but the response is disregarded with an error
When I changed the timeout to an array things worked better, however I still wonder whether the last timeout will still send an extra packet.
It's buggy for the last timeout interval in the list. The packet is transmitted, then the next item in list of timeouts is consulted to schedule a retransmit in the future - if the list is now empty, the request is dropped. Instead, the last interval should be used for dropping the request and not for doing a retransmission.
So [30000]
should mean no retries and timeout after 30 seconds, but it doesn't work that way currently. Well, you get the timeout after 30 seconds, but it sends a last unnecessary packet too.
UPDATE: 11.11.2015:
With wireshark I have actually seen some devices sending the same response more than once. I don't think am affected by this bug at all.
I see this too, but only (it seems) when doing massively getAll requests over a larger (50~100) set of devices in parallel.
My timeouts are [1000, 2000, 4000], but I just got errors like:
Response with unknown request ID from x.y.z.149. Consider increasing timeouts (12 ms old?).
Response with unknown request ID from x.y.z.149. Consider increasing timeouts (9 ms old?).
Response with unknown request ID from x.y.z.8. Consider increasing timeouts (5 ms old?).
Response with unknown request ID from x.y.z.25. Consider increasing timeouts (5 ms old?).
Response with unknown request ID from x.y.z.13. Consider increasing timeouts (172 ms old?).
Response with unknown request ID from x.y.z.6. Consider increasing timeouts (6 ms old?).
Hi, any update about the possible fix?
Best regards!
I don't actually use this currently, but bug fix PRs are always welcome. :)
We can close issue? =)
Fixed at 1.0.19