Danielhiversen/pySwitchbot

pySwitchbot hangs if the switchbot is in "disc" state

Closed this issue · 9 comments

See also: home-assistant/core#61535

I believe the bug is here: https://github.com/Danielhiversen/pySwitchbot/blob/master/switchbot/__init__.py#L283

If the switchbot is in disc state, it tries to read a response but it will never get one. If you remove disc from the while loop, it will fall into BTLEDisconnectError exception and retry properly the next time around, without any deadlocks.

So here is the fix I'd propose.


        while rsp.get("state") and rsp["state"][0] in [
            "tryconn",
            "scan",
-           "disc",
        ]:  # Wait for any operations to finish.
            rsp = self._getResp(["stat", "err"], timeout)

Hi @S57HZWW,

The helper erroneously reports disc state. If that's not there then it incorrectly fails 5/10 times. It seems there is another bug in the disconnect() function that leads to deadlocks. I'll need to test first though.

As of 2019-12-06, you have to fix a bug in the btle.py library:
Comment out the _getResp in disconnect() or it can hang on readline
    when a 'stat disc' reply from the helper is overlooked (as in
    during a call to setMTU).

There hasn't been any recent releases to the underlying bluepy helper. (Last release was before 2018)

I'm actually working replacing bluepy with bleak as I get time.

Should just be able to add this function:

    def disconnect(self):
        if self._helper is None:
            return
        # Unregister the delegate first
        self.setDelegate(None)

        self._writeCmd("disc\n")
#        self._getResp('stat')
        self._stopHelper()

to

class SwitchbotDevice(bluepy.btle.Peripheral):

and it should overwrite the original function causing deadlocks.

I believe the disc state is correct due to the failure to initially connect to the Switchbot. I would think that it's waking up so the next retry should work, which seems to be the case. My theory is that we're trying to get a response for disc in that while loop, but no message will ever be sent from the switchbot because we never connected to it in the first place. I'm not sure what you mean by "incorrectly fail 5/10 times", but shouldn't we handle the exception being raised as retries? On my patched version I do get exceptions but the retry succeeds - this is much better than deadlocking and having to restart home assistant.

For example, here is a successful command being sent:

2022-02-06 18:01:06 DEBUG (SyncWorker_8) [switchbot] Prepare to read notification from switchbot
2022-02-06 18:01:07 DEBUG (SyncWorker_24) [switchbot] Sending command to switchbot 570f450105ff64
2022-02-06 18:01:07 DEBUG (SyncWorker_24) [switchbot] Connecting to Switchbot
2022-02-06 18:01:07 DEBUG (SyncWorker_24) [switchbot] Helper is none
2022-02-06 18:01:07 DEBUG (SyncWorker_24) [switchbot] write cmd
2022-02-06 18:01:07 DEBUG (SyncWorker_24) [switchbot] after write cmd
2022-02-06 18:01:07 DEBUG (SyncWorker_24) [switchbot] after rsp
2022-02-06 18:01:07 DEBUG (SyncWorker_24) [switchbot] in state tryconn
2022-02-06 18:01:07 DEBUG (SyncWorker_24) [switchbot] Subscribe to notifications
2022-02-06 18:01:08 DEBUG (SyncWorker_24) [switchbot] Prepare to send
2022-02-06 18:01:09 DEBUG (SyncWorker_24) [switchbot] Sending command, 570f450105ff64
2022-02-06 18:01:09 INFO (SyncWorker_24) [switchbot] Successfully sent command to Switchbot (MAC: ...)

And when it deadlocks, you see a CONNFAIL error followed by a deadlock.

2022-02-06 18:01:30 DEBUG (SyncWorker_7) [switchbot] Connecting to Switchbot
2022-02-06 18:01:30 DEBUG (SyncWorker_7) [switchbot] write cmd
2022-02-06 18:01:30 DEBUG (SyncWorker_7) [switchbot] after write cmd
2022-02-06 18:01:30 DEBUG (SyncWorker_7) [switchbot] after rsp
2022-02-06 18:01:30 DEBUG (SyncWorker_7) [switchbot] in state tryconn
2022-02-06 18:01:31 DEBUG (SyncWorker_7) [switchbot] in state disc
2022-02-06 18:01:31 DEBUG (SyncWorker_7) [switchbot] Error trying to connect to peripheral ...., error: connfail

@RenierM26 @S57HZWW I have also issues with the HomeAssistant integration where this library is used with my Switchbot curtains.
Can I help testing some fixes/code changes? Or maybe deliver some logs?

I have created PR #39 to fix this issue

I have created PR #39 to fix this issue

Hope this will be approved soon, so we can update the dependency in the integration over here.

@RenierM26 if the solution you are proposing is better than the one in my PR (proposed by S57HZWW), please feel free to create a new PR, but in the meantime let´s get this #39 merged so the switchbot stops failing

@S57HZWW #40 has been now merged and integrated in HA 2022.3, so this issue shouldn´t happen again (@pascalwinters and I have been testing it during the last week), please test it and if it is working you can close this issue.

P.S.: Please be aware after this fix sometimes the switchbot takes longer to respond (up to 20-30 seconds), but it doesn´t hang anymore.

This issue has been resolved with the change over to the new library