eclipse/paho.mqtt.c

Bad QoS when persistence enabled and not removed

lugrinder opened this issue · 14 comments

Describe the bug
When the persistence is enabled with MQTTCLIENT_PERSISTENCE_DEFAULT or MQTTCLIENT_PERSISTENCE_USER, QoS is incorrectly set. The message was sent and arrives to any subscribed client, and was interpreted as QoS:1, but the publication is not removed from persistence due wrong QoS.

Expected behavior
Send correct qos.

Log files
Both logs are the same source code, with QoS:1 and with trace set to minimal.
Logs are too small, I think there is no need to attach them.

With MQTTCLIENT_PERSISTENCE_NONE:

Trace: 4, 19700101 010000.000 16 test-deadbeaf -> PUBLISH msgid: 2 qos: 1 retained: 0 rc 0 payload len(83): {"time":1687264049,"...
Trace: 3, 19700101 010000.000 m->c->connect_state = 0
Trace: 4, 19700101 010000.000 16 test-deadbeaf <- PUBACK msgid: 2
Trace: 3, 19700101 010000.000 PUBACK received from client test-deadbeaf for message id 2 - removing publication
Trace: 3, 20230620 122731.188 Calling deliveryComplete for client test-deadbeaf, msgid 2
Trace: 3, 20230620 122731.188 Calling publish success for client test-deadbeaf

With MQTTCLIENT_PERSISTENCE_DEFAULT or MQTTCLIENT_PERSISTENCE_USER (custom-made sqlite3 persistence):

Trace: 4, 20230620 120916.858 16 test-deadbeaf -> PUBLISH msgid: 2 qos: 2099405057 retained: 0 rc 0 payload len(83): {"time":1687262955,"...
Trace: 3, 20230620 120916.858 m->c->connect_state = 0
Trace: 4, 20230620 120916.858 16 test-deadbeaf <- PUBACK msgid: 2
Trace: 3, 20230620 120916.858 Packet PUBACK received from client test-deadbeaf for message identifier 2, but message is wrong QoS, 2099405057
Trace: 3, 20230620 120916.858 Calling deliveryComplete for client test-deadbeaf, msgid 2
Trace: 3, 20230620 120917.156 Calling publish success for client test-deadbeaf

Environment:

  • OS: Linux
  • Version: 2.6.25.17 (gcc version 4.2.4)

Additional context
Tested with EMQX v5.0.24 server.

I just got around to trying this out, and I don't see what you do. I changed the paho_c_pub to use default persistence and I got this:

Trace : 4, 20231010 123630.808 3 paho-c-pub -> PUBLISH msgid: 9 qos: 1 retained: 0 rc 0 payload len(10): adsfadfas
Trace : 3, 20231010 123630.808 m->c->connect_state = 0
Trace : 4, 20231010 123630.808 3 paho-c-pub <- PUBACK msgid: 9
Trace : 3, 20231010 123630.808 PUBACK received from client paho-c-pub for message id 9 - removing publication
Trace : 3, 20231010 123630.808 Calling publish success for client paho-c-pub

This was Linux too (64 bit Intel). Is there more to your environment?

I'm compiling with GCC 4.4.7 and uClibc 0.9.30.1 for and old ARM920T. Lib seams to work fine, only this issue.

So that's a 32 bit architecture?

Yes, 32 bits.

Is this with the develop branch? If so, there was a PR which affected some read/writes which might affect this - it's now been reverted in the develop branch so you should update if that's what you're using. If not, which release are you using?

No, I'm using the master branch v1.3.12. If it helps, these are my compilation configuration options:

PAHO_ENABLE_TESTING=FALSE
PAHO_ENABLE_CPACK=FALSE
PAHO_HIGH_PERFORMANCE=TRUE
PAHO_WITH_SSL=FALSE
PAHO_BUILD_SHARED=FALSE
PAHO_BUILD_STATIC=TRUE

An endian or align issue?

I've dug out a Raspberry Pi first edition, which appears to be 32-bit so I'll try it there.

Ok, I tried it on the Pi to test.mosquitto.org, and it seemed to work:

Trace : 3, 20231013 174301.030 (3056157808)  (2)> MQTTProtocol_handlePubacks:442
Trace : 4, 20231013 174301.030 3 paho-c-pub <- PUBACK msgid: 1
Trace : 3, 20231013 174301.030 PUBACK received from client paho-c-pub for message id 1 - removing publication
Trace : 3, 20231013 174301.030 (3056157808)    (3)> MQTTPersistence_remove:550
Trace : 3, 20231013 174301.030 (3056157808)     (4)> pstremove:329
Trace : 3, 20231013 174301.030 (3056157808)     (4)< pstremove:363 (0)
Trace : 3, 20231013 174301.030 (3056157808)     (4)> pstremove:329
Trace : 3, 20231013 174301.030 (3056157808)     (4)< pstremove:363 (0)
Trace : 3, 20231013 174301.030 (3056157808)     (4)> pstremove:329
Trace : 3, 20231013 174301.034 (3056157808)     (4)< pstremove:363 (0)
Trace : 3, 20231013 174301.034 (3056157808)     (4)> pstremove:329
Trace : 3, 20231013 174301.034 (3056157808)     (4)< pstremove:363 (0)
Trace : 3, 20231013 174301.034 (3056157808)    (3)< MQTTPersistence_remove:608 (0)
Trace : 3, 20231013 174301.034 (3056157808)   (2)< MQTTProtocol_handlePubacks:474 (0)
Trace : 3, 20231013 174301.034 Calling publish success for client paho-c-pub'

You could try with Mosquitto as well, just to make sure it does the same thing.

Same issue with test.mosquitto.org and local mosquitto v1.6.15 compiled with the same environment as paho.mqtt.c library.

...PUBLISH msgid: 1 qos: 2097152001 retained: 0 rc 0 payload len(93)...
...PUBLISH msgid: 2 qos: 2097152001 retained: 0 rc 0 payload len(117)...

I tried it with several different build options on the Raspberry Pi, and got the same (working) result. That was with the develop branch, which is now the 1.3.13 release. I don't know if there are any changes between 1.3.12 and 1.3.13 which would affect you. I'll try 1.3.12 when I get the chance.

I've identified the issue: it's a misalignment problem.
I believe it's due to my environment configuration (compiler, CPU, platform...), although I'm not exactly sure. This issue could potentially affect other environments as well.
The problem lies within the MQTTAsync_restoreCommand function in the MQTTAsyncUtils.c file. When commands are being restored, the qos and other values are being cast from char* to int*, which enforces alignment to sizeof(int). This leads to incorrect data restoration if the alignment is not maintained, especially due to the variable length of the payload.
My proposed solution is to replace these assignments with a memcpy operation to ensure correct alignment on any platform or environment. I am attaching a patch to fix it, applicable to version 1.3.13.
paho-mqtt-c-1.3.13-0003-fix-MQTTAsyncUtils.patch

Thanks. I take it that this patch worked for you?

Yes, I've created and used this patch on my project.

Thanks. That's interesting, it's an aspect I wasn't aware of until recently, mainly because it works just about all the time, so I didn't have any reason to look into it.