eclipse/paho.mqtt.c

Cannot shutdown paho gracefully if connection lost to the broker and there are still messages to publish

smallSwed opened this issue · 0 comments

Describe the bug
I have a publisher publishing messages via MQTTAsync_sendMessage. If I initiate a shutdown but the connection was already lost to the broker, the MqttAsync_disconnect returns Success but never processed: no callback called (neither failure nor success)
My disconnect wait will timeout. So I call the MqttAsync_destroy which will leak.

To Reproduce
Not always reproducible, needs certain conditions met: it needs some PUBLISH commands in the MQTTAsync_commands when the connection is lost:

  1. start broker (e.g. mosquitto)
  2. start client to publish messages via the Async API
  3. kill the broker
  4. try to shut down the client via MqttAsync_disconnect -> wait for callbacks -> MqttAsync_destroy sequence
  5. see logs for leak report

Expected behavior
A proper cleanup for unsent but queued publish messages
maybe: MqttAsync_disconnect would disconnect no matter if there are queued publish commands
2. MqttAsync_destroy would clean up the unset messages

Log files
memory_leak.log

Points of interest in the log file:

  • first ~200 line - connection to broker
  • until 10795 - successful publishing
  • line 10386 - disconnect from the broker
  • line 11221 - memory leak reported

I had to clean the log files from some of the logs of my application, but all of the PAHO logs are included in the file.

** Environment (please complete the following information):**

  • OS: Windows 11
  • Version: paho.mqtt.c 1.3.12

Additional context
One condition for this bug is the lost connection to the broker and no reconnect in a reasonable time.

The leaked data is the payload and the destination name string passed to MQTTAsync_sendMessage for all unsent messages.

I did some investigation and it's comes down to this:

  1. the reason for the disconnect command was not processed is the following:
    our MqttAsync_disconnect call will put a DISCONNECT command at the end of  MQTTAsync_commands
    MQTTAsync_processCommand will find a PUBLISH command for our client which is disconnected so it will be not processed and always stays at the start of the list. Our client will be put inside the ignore_clients so the DISCONNECT will be ignored at the end of the list.
  2. Reason for the leak:
    In MQTTAsync_destroy there is an assumption that all the payloads and destinationNames are stored in MQTTProtocol_storePublication so the MQTTAsync_NULLPublishCommands called , but for some reason those are not stored there or not freed from there.

The leaked mallocs are from MQTTAsync_send:

if ((pub->command.details.pub.destinationName = MQTTStrdup(destinationName)) == NULL)
...
if ((pub->command.details.pub.payload = malloc(payloadlen)) == NULL)

The call stack:

T.dll!mymalloc() - Line 201
T.dll!MQTTAsync_send() - Line 1277
T.dll!MQTTAsync_sendMessage() - Line 1317
T.dll!OTMqttCommunication::send() - Line 1301