php-mqtt/client

Issue with long running systemd deamon ?

UtechtDustin opened this issue · 14 comments

Hello,

we have the following script which runs as systemd demaon.

$clientId = 'master-server' . uniqid(true, true);
$settings = require __DIR__ . '/../src/settings.php';
$port = $settings['mqttPort'];
$server = $settings['masterServerUrl'];
$registerToken = $settings['registerToken'];
$mac = $settings['mac'];
$commercialCustomerId = $settings['commercialCustomerId'];
$connectionSettings = (new \PhpMqtt\Client\ConnectionSettings())
    ->setUseTls($settings['mqttSSL'])
    ->setUsername($commercialCustomerId)
    ->setPassword($registerToken)
    ->setLastWillMessage('false')
    ->setLastWillTopic( "nuc/{$registerToken}/{$mac}/online")
    ->setRetainLastWill(true)
    ->setMaxReconnectAttempts(100)
    ->setDelayBetweenReconnectAttempts(6000)
    ->setReconnectAutomatically(true);

$last = 0;
$mqtt = new \PhpMqtt\Client\MqttClient($server, $port, $clientId);
$mqtt->registerLoopEventHandler(function  ($mqttClient, $elapsedTime) use (&$last, $registerToken, $mac) {
    if ($elapsedTime % 5 === 0 && $last !== (int)$elapsedTime) {
        $mqttClient->publish("nuc/{$registerToken}/{$mac}/online", 'true', 0, true);
        $last = (int)$elapsedTime;
    }
});

$mqtt->connect($connectionSettings);
$mqtt->publish("nuc/{$registerToken}/{$mac}/online", 'true', 0, true);
$mqtt->subscribe("nuc/{$registerToken}/{$mac}/reboot", function ($topic, $message) {
    system('sudo /sbin/reboot');
}, 0);
$mqtt->subscribe("nuc/{$registerToken}/{$mac}/shutdown", function ($topic, $message) {
    system('sudo /sbin/shutdown now');
}, 0);
$mqtt->loop(true);
$mqtt->close();

On our main-system (web based) we have two buttons reboot and shutdown for our device.
If the device starts, it works fine and we can reboot/stop the device with the topic's.
But if the device runs for a long time (a few days), the topics for reboot and shutdown stops working, but the onlint topic still will be send every 5 seconds.

Could this be an issue with the lib (e.g. the loop part) or maybe a issue with a resource, which no longer exists after a few days ?
We can't enforce this issue, so debuggen is quite hard.

This could be due to an auto-reconnect, in which case the client does not re-subscribe itself. Depending on the broker, its configuration, the used protocol version and the clean session flag of the connect() method, different behavior is possible on the broker side. With the correct configuration, the broker should remember subscriptions even if the client disconnects (at least for some time).

You might be able to verify whether that's the case by deliberately forcing a timeout of the client, followed by a reconnect:

$connectionSettings = (new ConnectionSettings)
    ->setTimeout(60) // Adjust as needed
    ->setReconnectAutomatically(true)
    ->setMaxReconnectAttempts(5)
    ->setDelayBetweenReconnectAttempts(1000);

$mqtt = new MqttClient($server, $port, $clientId);
$mqtt->connect($connectionSettings);
$mqtt->subscribe('foo', fn ($topic, $message) => log($topic, $message), MqttClient::QOS_AT_MOST_ONCE);

sleep(61); // Force a timeout

$mqtt->publish('bar', 'baz'); // Force a reconnect after timeout

// TODO: publish a message to 'foo' from another client and verify it is received

If you cannot reproduce the issue with above idea, do you have the ability to view the subscriptions of your broker?
And please leave some information regarding your broker and its configuration, it might help me to reproduce the issue.

Thanks (again) for the really quick answer.

I can try the test with the timeout, but i'm not sure why it could depend on the auto-reconnect feature.
If it would depend on the auto-reconnect feature the publish method should also not work (which works fine), correct ?

As broker we use mosquitto (latest version - v2.0.15 i guess).
I don't know how to track/check the active subscriptions on MQTT, is there a nice tool or something like that to do it ?

I can try the test with the timeout, but i'm not sure why it could depend on the auto-reconnect feature.
If it would depend on the auto-reconnect feature the publish method should also not work (which works fine), correct ?

Not quite, no. When you publish a message, especially with QoS 0, it is fire-and-forget. There is no state; the message simply gets forwarded by the broker in real-time and that's it.
But when you subscribe to a topic, you create some state on the broker. It has to remember which client has subscribed to which topic. A naive broker implementation would simply drop subscriptions of a client if the client disconnects. But with most brokers, you can use the clean session flag to indicate that the broker should keep the subscriptions of a client when it disconnects. The broker may even collect and store messages for the subscriptions while the client is offline, to send them to the client when it reconnects. But, and that's maybe less obvious, this doesn't work forever. At some point, the broker needs to drop subscriptions and messages because it doesn't have unlimited storage.

I'm not really sure how Mosquitto handles this aspect in detail, but I'm pretty sure it supports it. I'll see whether I can confirm the expected behavior or not.

Ok, so this is actually quite easy to test. I've used the following code with Mosquitto 1.6.15 (don't really have a v2 set up right now) and it does indeed store the subscriptions, so reconnects are not an issue.

Subscriber (start first):

$client = new MqttClient($host, $port, 'test-subscriber');
$client->connect();

$client->subscribe('foo/bar/baz', function (string $topic, string $message, bool $retained) use ($client) {
        $this->logMessage($topic, $message);

        $client->interrupt();
}, MqttClient::QOS_AT_MOST_ONCE);

// The sleep is required to give the broker some time to respond to the `SUBSCRIBE` request.
sleep(1);

// The loopOnce is required to receive the `SUBACK`, which marks the subscription as accepted.
$client->loopOnce(microtime(true));

// Then we can deliberately reconnect. This essentially does the same thing as an auto-reconnect.
$client->disconnect();
$client->connect();

// Now we simply wait for some message to be published to the topic 'foo/bar/baz'.
// This command will exit if a message is received on the topic.
$client->loop(true);

$client->disconnect();

Publisher (start second):

$client = new MqttClient($host, $port, 'test-publisher');
$client->connect();

$client->publish('foo/bar/baz', 'test', MqttClient::QOS_AT_MOST_ONCE);

$client->disconnect();

So this should rule out issues with the broker, I guess. Do you by any chance restart your broker sometimes? If your mosquitto.conf contains persistence true, this should be fine as well - but if not, this could also cause this behavior.

Thank you for all the information and the tests, i will test a few things you wrote at work and will give you feedback soon.

Sorry for the late feedback @Namoshek.
I guess i found the issue, as soon i restart the broker and the client reconnect's the client don't resubscribe the topic's.
And i don't think the cleanSession could solve it, because we don't want that the broker should persist the subscribtions forever (I also don't know if the broker would persist the subscriptions after a restart).

What do you think ?
Could you maybe add an option which enables resubscription after a reconnect or an Callback for the reconnect so i could resubscribe it there ?

Mosquitto will actually store subscriptions if it has persistence enabled and you do not send a clean session flag. I'm not too familiar with Mosquitto 2, but I guess you can even add some timeout for disconnected clients, after which their data will be deleted.

Anyhow, I'll look if I can come up with something. Maybe an additional event listener for a connected event would do the trick. Other than a built-in re-subscribe feature, this would also load off the responsibility to the user of the library, which I like. 😄

Oh okay, now i understand.
I'm quite new to MQTT or better i just used the basics before.
It could be possible that the persistence option could cause issues for our application, because we then have to unsubscribe manually, right ?

But the "connected" event should be great anyways :)

I had a quick look into the documentation for the mosquitto.conf and my expectations were confirmed, the persistence and persistent_client_expiration settings are what you are looking for.

To recap what the clean session flag does: the flag requests a clean session from the broker. Most brokers use this flag in two ways actually, which is why its not that self-explanatory in the first place:

  1. When a client connects for the first time with the clean session flag, the broker knows that it does not need to persist the clients data upon disconnect of the client.
  2. When a client which connected without the clean session flag before does reconnect with the clean session flag set, the broker knows it may throw away the persisted data for this client.

Your concern regarding the need to unsubscribe is valid, although there are broker limits in place which should prevent actual issues. The queue_qos0_messages setting for example allows you to control whether QoS 0 messages are persisted for disconnected clients or not. Wit max_queued_messages you can control how many messages are queued per client (persisted or not) and max_queued_bytes limits the total size of the queued messages per client.
You can also simply use a low persistent_client_expiration like 15m, which should be enough for clients to reconnect in case of connection loss or if the broker is restarted. By letting the MqttClient generate a random client id upon creation, you would also connect with a new identity for each run of the application, while actual reconnects of the same MqttClient instance would reuse the generated client id to grant access to the persisted data. Example:

$a = new MqttClient($host, $port);
$a->connect(useCleanSession: false);
$a->subscribe('mytopic/#', $callback, 0);
$a->disconnect();
$a->connect(useCleanSession: false);
// $a is still subscribed to 'mytopic/#'

$b = new MqttClient($host, $port);
$b->connect(useCleanSession: false);
// $b is not subscribed to 'mytopic/#' because it has another random client id

Thank you very much for that lovely explaination, i will try it tomorrow.
As we already use random client id's we only have to add useCleanSession: false, i guess :)

That's even the default, so you might be using it already.

I just talked to a few colleagues and we think the resubscription is in the area of ​​responsibility of the client.
I checked other mqtt clients for programming languages e.g. paho mqtt javascript client, a python client, etc.
I also checked two mqtt clients for mac.

Both programming clients and dekstop clients will resubscribe after a disconnect/restart of the broker.
Also i found a issue for the paho client with exactly my problem, the main contributer of the paho client answered that he has to subscribe within the connected event.

[...]you will want to add some code in the connectComplete callback that re-subscribes to the topics you want[...]

Similar documentation can be found for the other clients (also on stackoverflow).

So we would love if you could add an connected event, this is also usefull in other usecases i guess :)

Would love to see a roundup of your final solution when you've got there with the new event handler added in #152!

Sure, i quickly tested it.
The following example works fine for resubscribing.

$connectionSettings = (new ConnectionSettings())
    ->setMaxReconnectAttempts(10)
    ->setDelayBetweenReconnectAttempts(6000)
    ->setReconnectAutomatically(true);

$client = new MqttClient('localhost', 1883, 'test-subscriber');
$client->registerConnectedEventHandler(function (MqttClient $client) {
    $client->subscribe('test', function () {
        echo 'do stuff' . PHP_EOL;
    });
});

$client->connect($connectionSettings);
$client->loop();
$client->disconnect();

Simplte test script to subscribe something:

$client = new MqttClient('localhost', 1883, 'test-publisher');
$client->connect();
$client->publish('test', 'test');
$client->disconnect();

#152 was merged, so this issue is solved.
Thanks you ! :)