LabVIEW-Open-Source/MQTT-Client

Potential Bug with MQTT Client.lvlib:Client.lvclass:Ping.vi

agluck28 opened this issue · 9 comments

When implementing ping to other MQTT brokers, tested on Mosquitto, VerneMQ and HiveMQ, once the LabVIEW client sent the Ping, the broker would then disconnect the client due to an invalid message from being received.

Looking into the code, and I think there is an issue with the Ping.vi expecting a response. The response implementation uses the message ID of the packet to tie things together, however the PINGREQ type has no message ID and always defaults to a value of 0. This appears to cause the Client to respond back to the broker when it receives a PINGRESP packet, or really any packet, with a payload of 0x0000 which is an unknown packet type causing the disconnect.

By passing in the False flag for the Wait for ack input, this allows the Pings to be sent and the broker to respond as expected.

image

Thanks @agluck28 , I'll investigate and make a unit test for this. It seems we need to ensure we wait for the response, although the exact criteria might need some tweak.

By default, we need to receive a response, as per specification:
http://docs.oasis-open.org/mqtt/mqtt/v3.1.1/csprd02/mqtt-v3.1.1-csprd02.html#_Toc385349817

I think this also happens if the client is receiving messages with a QOS level of 0 at the same time pings are being sent. Since those messages won't have a message ID, it defaults to 0 I believe, it assumes it is the response to the PINGREQ since that had a message ID of 0 set for the expected response.

I think something new will be needed to look for the response for the PINGREQ since the message ID can't be used to tie things together and messages are most likely being received as well.

I think an override will be needed to handle the PINGRESP from the broker here. By exposing the PINGRESP from the server to be broadcast out on a new user event within MQTT Client.lvlib:Client.lvclass:Handle Incoming Packets.vi like below (code isn't that clean more of an example)

image

This event can be registered for when sending PINGREQ within MQTT Client.lvlib:Session.Client.lvclass:Send Packet.vi to wait for the expected response from the server.

image

This also allows users of the library to register for the PINGRESP event to take actions if needed.

I am worried about the implications of this proposed modification.
TCP ensures the delivery of messages in an ordered and lossless fashion, so I'm not certain I am onboard with the statement that we can receive messages without packetIDs concurrently. I guess that multiple pings from the same session, sent from parallel loops, could do it.

I have not tried yet, but could it be that not reserving the ID when it is 0x0000 would fix this?

btw, PINGRESP is a type of "ACK" packets... so not handling it as such might cause more problems than it solves.

The MQTT Broker can be receiving packets from other publishers that the client sending the ping is subscribed to. If these have a QOS of 0 there is no message ID. I don't think there is anything with TCP delivering out of order packets, but that packets are being sent from the broker to the client other than the PINGRESP.

I believe the only reason a 0x0000 packet is being generated is due to a bug with waiting for a response from a PINGREQ. The Session is receiving the standard QOS 0 messages that another client it publishing to the broker. The Session receives this message and sees that it has a message ID of 0 and the PINGREQ registered a message ID of 0 for a response. This then responds back to the broker, I believe due to default values, with a control packet of 0x0000 which is invalid.

From my understanding, the broker is only expected to respond with a PINGRESP when receiving a PINGREQ, no other response is expected. Taking a look at the Python MQTT client implementation, and it looks like the client loop handles the PINGREQ and PINGRESP based off of the keep alive time set at client creation. This could potentially be an approach to take here to handle this at a lower level and just bubble up events when the PINGRESP isn't received in a timely manner after the PINGREQ.

Hi @agluck28 ,

I'm trying to design a valid test case to reproduce the problem. Do you think the following sequence describes the issue you are seeing?

  • Create two clients and connect both to mosquitto test server.
  • Client A publishes data regularly on topic A/state
  • Client B subscribes to data from topic A/state
  • Client B occasionally sends a PingReq packet
  • While waiting for PingResp packet, clientB receives a message on topic A/state before receiving the PingResp ack, which causes the client to disconnect.

Since a typical PingResp arrives with >100ms latency against the mosquitto test server, it feels like this should be fairly easy to reproduce by more than pure luck, correct?

EDIT: Yes, that works... I reproduced the problem on the first Ping packet sent.

Hi @francois-normandin,

That is very close to what is happening, though I think the disconnect is happening due to the client responding back with an invalid packet when it receives data after a PINGREQ.

This was very reproducible when I was testing, I had an immediate disconnect after every ping sent. Do like working issues that always reproduce vs ones that are subject to race conditions.

Just for terminology:
A client never responds to PINGREQ, it is an invalid packet to process from the Client`s perspective. The server is the one that sends the PINGRESP packet. Symmetrically, the server never sends PINGREQ packets.

I think the problem lies in the Session which stores the packetID and "consumes" the incoming acknowledgements. Of all the packets, only CONNECT, CONNACK, DISCONNECT, PINGREQ and PINGRESP do not have packet IDs... with the PUBLISH at QoS = 0. Because PINGREQ and PINGRESP are not the only packets that do not have IDs "while a session is ongoing", you have the correct instinct that they will need to be treated differently to decouple them from the PUBLISH packets that do not have IDs. My earlier tests had all been to send Pings while no communication is active, so I did not see the problem of incoming packets being received while waiting for the ping responses.

I'll check in on your suggestion to use the keep-alive time as the virtual packet ID. If I keep PUBLISH = 0, and swap a virtual ID for PINGREQ/PINGRESP (even an arbitrary one such as 0xFFFF), it might solve the problem. If that`s the case, then I assume I can circumscribe the solution space to the Session.Client class.

Oh, I think I got an idea to solve this elegantly... instead of using the packetID (U16) as the key for storing and mapping acknowledgements in the Session class, I'll use a U32 that concatenates the packetID with the immutable packetHeader of the expected response (U32). I'll create a filter to ignore the packetID of Publish packets at QoS = 0 so we do not grow the ID map, and that should correctly map any packet that expects an ACK.
This would have the advantage of only adding inline code that requires no new event or storage mechanism. Just have to be careful that the logic works bidirectionally, since the session is used by both the Client and Server.

Has this issue been addressed? I am getting client disconnects, even though I am sending periodic pings to the server. My publisher is sending QoS 0 packets only. I have a test VI that disconnects randomly at different times. Sometimes it will run all night without a disconnect, other times it will disconnect in six minutes.