home-assistant/core

OpenTherm Gateway - Entities stauts "unavailable" after reconnection

cybergatto opened this issue · 6 comments

The problem

I'm using open therm gateway integration since years and everything worked like a charm.
Since a couple of weeks im facing the following issue, all the entities from otgw are going on unavailable state. reloading the integration will not solve the issue, but reloading the whole homeassistant instance are solving the issue.

Ive took a debug log of the integration at the point in time when the entities goes unavailable and i can see the following

2024-10-15 23:03:17.078 DEBUG (MainThread) [pyotgw.protocol] Received line 13558: B700E0000
2024-10-15 23:03:17.079 DEBUG (MainThread) [pyotgw.protocol] Received line 83399: B700E0000
2024-10-15 23:03:17.079 DEBUG (MainThread) [pyotgw.protocol] Received line 83420: B700E0000
2024-10-15 23:03:17.079 DEBUG (MainThread) [pyotgw.protocol] Received line 13585: B700E0000
2024-10-15 23:03:17.079 DEBUG (MainThread) [pyotgw.connection] Watchdog timer reset!
2024-10-15 23:03:17.079 DEBUG (MainThread) [pyotgw.connection] Watchdog timer reset!
2024-10-15 23:03:17.080 DEBUG (MainThread) [pyotgw.connection] Watchdog timer reset!
2024-10-15 23:03:17.080 DEBUG (MainThread) [pyotgw.connection] Watchdog timer reset!
2024-10-15 23:03:20.082 DEBUG (MainThread) [pyotgw.connection] Watchdog triggered!
2024-10-15 23:03:20.082 DEBUG (MainThread) [pyotgw.connection] Canceling Watchdog task.
2024-10-15 23:03:20.082 DEBUG (MainThread) [pyotgw.connection] Scheduling reconnect...
2024-10-15 23:03:20.133 DEBUG (MainThread) [pyotgw.commandprocessor] Sending command: PS with value 0
2024-10-15 23:03:25.134 ERROR (MainThread) [pyotgw.connection] Could not connect to serial device on socket://172.16.8.16:25238. Will keep trying. Reported error was: 
2024-10-15 23:03:27.689 DEBUG (MainThread) [pyotgw.protocol] Received line 83400: R00780000
2024-10-15 23:03:27.689 DEBUG (MainThread) [pyotgw.messageprocessor] Added line to message queue. Queue size: 1
2024-10-15 23:03:27.689 DEBUG (MainThread) [pyotgw.messageprocessor] Processing: R 00 78 00 00
2024-10-15 23:03:27.690 DEBUG (MainThread) [pyotgw.protocol] Received line 1: R00780000
2024-10-15 23:03:27.690 DEBUG (MainThread) [pyotgw.messageprocessor] Added line to message queue. Queue size: 1
2024-10-15 23:03:27.690 DEBUG (MainThread) [pyotgw.messageprocessor] Processing: R 00 78 00 00
2024-10-15 23:03:27.691 DEBUG (MainThread) [pyotgw.protocol] Received line 83421: R00780000
2024-10-15 23:03:27.691 DEBUG (MainThread) [pyotgw.messageprocessor] Added line to message queue. Queue size: 1
2024-10-15 23:03:27.691 DEBUG (MainThread) [pyotgw.messageprocessor] Processing: R 00 78 00 00
2024-10-15 23:03:27.691 DEBUG (MainThread) [pyotgw.protocol] Received line 13586: R00780000
2024-10-15 23:03:27.691 DEBUG (MainThread) [pyotgw.messageprocessor] Added line to message queue. Queue size: 1
2024-10-15 23:03:27.691 DEBUG (MainThread) [pyotgw.protocol] Received line 13651: R00780000
2024-10-15 23:03:27.692 DEBUG (MainThread) [pyotgw.messageprocessor] Added line to message queue. Queue size: 1
2024-10-15 23:03:27.692 DEBUG (MainThread) [pyotgw.messageprocessor] Processing: R 00 78 00 00
2024-10-15 23:03:27.692 DEBUG (MainThread) [pyotgw.messageprocessor] Processing: R 00 78 00 00
2024-10-15 23:03:27.711 DEBUG (MainThread) [pyotgw.protocol] Received line 13652: B4078091C
2024-10-15 23:03:27.711 DEBUG (MainThread) [pyotgw.messageprocessor] Added line to message queue. Queue size: 1
2024-10-15 23:03:27.711 DEBUG (MainThread) [pyotgw.messageprocessor] Processing: B 04 78 09 1C
2024-10-15 23:03:27.711 DEBUG (MainThread) [pyotgw.protocol] Received line 83401: B4078091C
2024-10-15 23:03:27.711 DEBUG (MainThread) [pyotgw.messageprocessor] Added line to message queue. Queue size: 1
2024-10-15 23:03:27.711 DEBUG (MainThread) [pyotgw.messageprocessor] Processing: B 04 78 09 1C
2024-10-15 23:03:27.711 DEBUG (MainThread) [homeassistant.components.opentherm_gw] Received report: {'boiler': {'burner_hours': 2332}, 'gateway': {}, 'thermostat': {}}
2024-10-15 23:03:27.713 DEBUG (MainThread) [pyotgw.protocol] Received line 2: B4078091C

What it seems to me is that at 2024-10-15 23:03:17.079 the tcp connection drop, might be any network related issue, the integration then try to reconnect, but get the "Could not connect to serial device" error.

besides of the error, the communication correctly restart on 2024-10-15 23:03:27.689 but the status of the devices on homessistant stay unavailable and there's no way to getting back online without rebooting the whole homeassistant instance.

After the reboot anything goes back working again untill the next connection issue.

What version of Home Assistant Core has the issue?

core-2024.10.2

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

OpenTherm Gateway

Link to integration documentation on our website

https://www.home-assistant.io/integrations/opentherm_gw

Diagnostics information

home-assistant_opentherm_gw_2024-10-16T08-58-52.907Z.zip

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

Hey there @mvn23, mind taking a look at this issue as it has been labeled with an integration (opentherm_gw) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of opentherm_gw can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign opentherm_gw Removes the current integration label and assignees on the issue, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


opentherm_gw documentation
opentherm_gw source
(message by IssueLinks)

mvn23 commented

Thank you for the report. This seems to be a bug in pyotgw rather than Home Assistant. More specifically, the connect/reconnect logic of pyotgw seems to leave "old" connections open when triggering a reconnect. At some point in your logs, there were 3 connections active at the same time...
In which version of Home Assistant and/or pyotgw did everything still work as expected? We changed pyotgw versions twice this year: we switched from pyotgw 2.1.3 to 2.2.0 in HA 2024.6.0 and from 2.2.0 to 2.2.1 in HA 2024.10.2.

Ciao @mvn23 unfortunately i can't say exactly, local clima was warm until a couple of weeks ago so I was not using the heating system. unfortunately i've not enough backlog to identify the exact point in time where this stopped working...

mvn23 commented

Backups are made by default before updating. You can find and restore them in Settings > System > Backups to test versions that were installed in the past.

Ok, thanks, I checked the timeline of my updates. the bug is likely being introduced in pyotgw 2.2.0 as i can confirm i've experienced the behavior described in both HA 2024.9.3 and HA 2024.10.2

mvn23 commented

The issue should be fixed in pyotgw 2.2.2. This issue will be closed when the version bump gets merged. Please test the new version when it gets published and re-open this issue or create a new one if the problem re-occurs.