hassio-addons/addon-zwave-js-ui

General Z-Wave Instability with 2.2.0 and Later

jbhorner opened this issue ยท 34 comments

Problem/Motivation

After upgrading from 2.1.2 to 2.2.0 and subsequently 2.2.1, my Z-Wave network has been unstable. Nodes will randomly go dead frequently. The prior to these updates, the network was reliable. (From a user perspective...things behind the scenes may have been different, but there were no negative functionality observations.)

Expected behavior

Stability in the network. Node stay "alive."

Actual behavior

After updating, there were nodes that just started to "die." Toggling the node on and off (the nodes that were noted as dead are all powered) brings the node back online for a period of time. I have performed network repairs through the ZW UI to see if that helped. It did not.

Steps to reproduce

Update to 2.2.0 or 2.2.1. Observe node status or set up an automation to notify when a node's status changes from "alive" to dead.

Proposed changes

No thoughts on this. I did read the release notes and saw that there were changes that were made to mark nodes as dead under certain circumstances.
zwave-js-ui-store (1).zip
zwave-js-ui-store.zip

+1

+1

During the day there also seem to be quite a few random restarts of the add-on during the day.

I'd also like to add, don't know if relevant to the case, that some more complex devices like the Quibino ZMNHXD (3-phase meter) lost a lot of entities that became unavailable and seems to have renamed a few others. Re-query won't solve the problem.

I started by renaming the entities back to their original names since many of my automations were obviously now failing, till I noticed the unavailable ones and just gave up.

I came here to see if anyone had already reported it and there were good news already, but that does not seem to be the case.
I'm reverting back to the standard z-wave add-on hopping things work there.

I came here to see if anyone had already reported it and there were good news already, but that does not seem to be the case.
I'm reverting back to the standard z-wave add-on hopping things work there.

It will be interesting to hear your success with that. My belief is that it is the driver, which I believe is shared by both add-ons. What Z-Wave controller are you using? I'm using a Zooz 800 Series. I specifically selected this one recently after seeing the problems (another issue) with 500 Series controllers, and some firmware version problems with the 700 Series controllers.

With all of the problems I have had over the past two months with Z-Wave, I'm looking to migrate away from it completely. I don't know what changes started in August, but it seems to be a series of ongoing problems now versus the past.

I'm using the aeotec gen7+ stick and 2.2.x is basically unusable. I rolled back to 2.1.2 and it's working again.

It will be interesting to hear your success with that.

You guess it right.
All the same with the standard/default add-on.
What a disappointment!

I personally had a stable z-wave system for over 1 year, perhaps 2.
I agree with you, someone has done more harm to z-wave users on HASS over these last few weeks than years before summed up.

I understand this is freeware software, I understand this is open source, we can't complain for something we did not pay a cent, I understand all of that.
However, I should also point out there are a lot of users using this code, so the first rule should always be test, the second rule should be test again, and the third rule must be test once more, out of respect for those end-users.

We come to expect enhancements on each release, not in a million chances we ever expect new releases to render our systems unusable.

Sorry for the rant.
I'm available to assist solving this mess, but I would much rather prefer this mess didn't occur at all in the first place.

I'm using the aeotec gen7+ stick and 2.2.x is basically unusable. I rolled back to 2.1.2 and it's working again.

I don't know if I have backups from 2.1.2, I'll check, thanks for the info!

EDIT: I have! Hurray! Rolling back now...

I don't know if I have backups from 2.1.2, I'll check, thanks for the info!

Would be great if there was an option in HA to rollback the image to a specific version. I keep all my addon backups for a month because of all the zwave issues lately :(

Well, for a few minutes all was well, I had voltage and amps values again, I was about to come here to thank you @davidcoulson , then they went away again. STRANGE!

I'm starting to suspect the problem is in the definition of the device, perhaps someone edited the properties for the device and broke it. Somehow it seems shortly after the rollback the device properties were updated automatically and once again I lost attributes that became unavailable again.

What a mess.
Next I'll try to remove the device from the network and add it back with the same name to see if it works.
Let me tell you, the next time I get this thing working I'll immediately disable updates and never update it again!

I don't know if I have backups from 2.1.2, I'll check, thanks for the info!

Would be great if there was an option in HA to rollback the image to a specific version. I keep all my addon backups for a month because of all the zwave issues lately :(

I'm using "Home Assistant Google Drive Backup" add-on and it gives you just that, try it out, it works like a charm!

I'm starting to suspect the problem is in the definition of the device, perhaps someone edited the properties for the device and broke it. Somehow it seems shortly after the rollback the device properties were updated automatically and once again I lost attributes that became unavailable again.

Did you try reinterviewing the node?

I'm using "Home Assistant Google Drive Backup" add-on and it gives you just that, try it out, it works like a charm!

Yeah that is what i am using too, but you have to make sure it doesn't remove the backup otherwise it's impossible to roll back.

Another possible hint on the problem... from all the 33 devices I had, in one of the more recent z-wave updates I suddenly gained the notification of having 20 repairs to perform on those devices... quoting one paragraph:

"Z-Wave JS discovers a lot of device metadata by interviewing the device. However, some of the information has to be loaded from a configuration file. Some of this information is only evaluated once, during the device interview."

Something tells me those configuration files got beaten up pretty well and that's the root cause of all our issues... worth investigating IMHO, it's definitely not normal to have 20 repairs to do on almost all of my z-wave devices!

image

EDIT: Some minutes after it went up to 30 devices needing repair... WTF...

I'm starting to suspect the problem is in the definition of the device, perhaps someone edited the properties for the device and broke it. Somehow it seems shortly after the rollback the device properties were updated automatically and once again I lost attributes that became unavailable again.

Did you try reinterviewing the node?

yeah, the attributes are still there, but unavailable... a small cut from the properties:

image
image

Update: After excluding and including the Qubino devices (I have 2 of them), all entities are back available and reporting values.

image

The only thing strange is that firmware versions are way different, when both devices are the same bought at the same time.
Quite hard to imagine they have firmware versions that different, but I guess it can happen... and they both report being up to date... :-/

image

I'd say it's the update/upgrade/repair process that messes up with the device altogether, I don't know, I'm just reporting what happened to me hopping it will help other users or even the developers.

Cheers,

I've opened this issue under HAS Core as well, as I'm not sure if I was right to open it here directly. (I'd used the link in the Add-on documentation.)

When I first installed the 2.2.0, I was also presented with several "repairs" by HA Core. Each of the repairs said it was necessary to interview several devices. I performed this for each device. I didn't think much about it at the time, as it was conceivable that the add-on was fixing issues that earlier versions caused or didn't address themselves.

I echo the frustration noted above, but also balance that with the knowledge that the developers volunteer their time here, and do not have a means by which they can test every conceivable configuration. They also have their "day jobs/activities." I think there was a driver architecture change/reconciliation that started in July/August, and since that time some latent problems might have come up. Pure conjecture on my part.

For those who have not created and uploaded logs, I'd encourage you to do so. This is what helps the developers the most.

developers volunteer their time here, and do not have a means by which they can test every conceivable configuration. They also have their "day jobs/activities."

You are absolutely correct, I think we all understand, appreciate, and value that!
However, that has always been the case, and never in the past, at least that I recall, z-wave got to damaged as these last weeks.

It's not just z-wave, signal add-on is literally unusable now as well, perhaps it's just a sign of times, or a passing phase to quote Pink Floyd, but it's worth signaling our disappointment I think, so that developers realize we're going down hill here.

I'll stop commenting now, I just realized we're on github, not on HASS Community... apologies to all for this.

I don't know if I have backups from 2.1.2, I'll check, thanks for the info!

Would be great if there was an option in HA to rollback the image to a specific version. I keep all my addon backups for a month because of all the zwave issues lately :(

I'm using "Home Assistant Google Drive Backup" add-on and it gives you just that, try it out, it works like a charm!

I've never had luck restoring ZWave addons. Every time, after restoring the backup, I get : Image ghcr.io/hassio-addons/zwave-js-ui/amd64:2.1.2 does not exist for addon_a0d7b954_zwavejs2mqtt. Is there a process outside of restoring the backup to make this work? (I usually just restore a VM snapshot.)

I've never had luck restoring ZWave addons. Every time, after restoring the backup, I get : Image ghcr.io/hassio-addons/zwave-js-ui/amd64:2.1.2 does not exist for addon_a0d7b954_zwavejs2mqtt. Is there a process outside of restoring the backup to make this work? (I usually just restore a VM snapshot.)

Usually you just need to wait longer for it to restore. Or just run the restore again.

JtwoA commented

Piling on here. Current version: 2.2.3 and this morning every single device is "Dead". I use this primarily for my iBlinds and a couple other ancillary items but the blinds are kinda critical.

JtwoA commented

Piling on here. Current version: 2.2.3 and this morning every single device is "Dead". I use this primarily for my iBlinds and a couple other ancillary items but the blinds are kinda critical.

Update: restarting the add-on restored all but two iblinds. Then a third dropped. Then one of the two dead nodes restored. It has a life of it's own atm.

Any update on this? Any fixes in the last days?

+1

2023-10-22 12:48:59.245 INFO Z-WAVE: [Node 063] Is dead
2023-10-22 12:48:59.292 INFO Z-WAVE: [Node 054] Is alive
2023-10-22 12:48:59.371 INFO Z-WAVE: [Node 039] Is alive
2023-10-22 12:48:59.499 INFO Z-WAVE: [Node 038] Is alive
2023-10-22 12:48:59.611 INFO Z-WAVE: [Node 053] Is alive
2023-10-22 12:48:59.660 INFO Z-WAVE: [Node 044] Is alive
2023-10-22 12:48:59.807 INFO Z-WAVE: [Node 055] Is alive
2023-10-22 12:48:59.849 INFO Z-WAVE: [Node 065] Is alive
2023-10-22 12:48:59.895 INFO Z-WAVE: [Node 064] Is alive
2023-10-22 12:48:59.943 INFO Z-WAVE: [Node 048] Is alive
2023-10-22 12:49:07.942 INFO Z-WAVE: [Node 026] Is dead
2023-10-22 12:49:20.319 INFO APP: GET /health/zwave 301 2.076 ms - 191
2023-10-22 12:49:38.483 INFO Z-WAVE: Controller status: Controller is unresponsive
2023-10-22 12:49:50.472 INFO APP: GET /health/zwave 301 5.830 ms - 191

Has anyone tried the 3.0.0 Add-On update yet? I'll give it a go tomorrow, but wasn't sure if anyone had beat me to it :)

Has anyone tried the 3.0.0 Add-On update yet? I'll give it a go tomorrow, but wasn't sure if anyone had beat me to it :)

I installed it yesterday and haven't had any issues. I was on 2.1.2 before due to issues I was having with later versions, with devices moving to a "dead" status randomly. Though after installation of 3.0.0 I was still presented with several repairs that were necessary (all tied to my motion sensors), I executed those and everything has been stable.

I can't speak, of course, to issues others were having. Mine, for the moment, seems to have been resolved.

JtwoA commented

Has anyone tried the 3.0.0 Add-On update yet? I'll give it a go tomorrow, but wasn't sure if anyone had beat me to it :)

I updated immediately and for a few days everything was fine. Now I'm right back to 12-15/17 devices going dead and requiring everything from a simple "ping" to bring them back to completely removing/repairing.... which is a major PITA because it means retouching every single automation rule they were in.

I wish someone involved in this would bother to acknowledge this issue.

Update to 3.0.1 does not resolve the problem - after HA reboot the status "Controller is unresponsive"

Update to 3.0.2 and Home Assistant to 2023.11.2 resolve a problem. 12 hours after update - all is ok

I can also confirm that 3.0.2 is working for me after lots of issues with the earlier versions. Running 500-series stick in a Proxmox VM. I am still running Home Assistant 2023.9.1 in case I needed to roll back the z-wave driver.
After disabling the soft reset, I have been running for 72 hours without problems so far.

so bad news. =(
after few hours have a problem instability Z-Wave again
reboot entire node resolve problem, but appears again after few hours (

                             n 100 ms.

2023-11-15T18:36:47.667Z DRIVER ยป [REQ] [GetPriorityRoute]
node ID: 44
2023-11-15T18:36:47.671Z CNTRLR Failed to execute controller command after 2/3 attempts. Scheduling next try i
n 1100 ms.
2023-11-15T18:36:48.774Z DRIVER ยป [REQ] [GetPriorityRoute]
node ID: 44
2023-11-15T18:36:48.779Z CNTRLR Retrieving priority route failed: Failed to send the message after 3 attempts
(ZW0202)
2023-11-15T18:36:48.785Z DRIVER ยป [REQ] [GetPriorityRoute]
node ID: 54
2023-11-15T18:36:48.787Z CNTRLR Failed to execute controller command after 1/3 attempts. Scheduling next try i
n 100 ms.
2023-11-15T18:36:48.889Z DRIVER ยป [REQ] [GetPriorityRoute]
node ID: 54
2023-11-15T18:36:48.892Z CNTRLR Failed to execute controller command after 2/3 attempts. Scheduling next try i
n 1100 ms.

revert back =(

self-enabled soft-reset after reboot& Disable it's again solve problems. Thanks a lot @Wiigian !

Soft-Reset is disabled at the software interface, the Z-Stick Gen5 does not support soft reset which is why this can cause issues. If you have ZWaveJS UI

  1. Open ZWaveJS UI
  2. Open the Menu -> Settings -> Z-Wave
  3. disable / grey out the switch next to Soft Reset:

There hasn't been any activity on this issue recently, so we clean up some of the older and inactive issues.
Please make sure to update to the latest version and check if that solves the issue. Let us know if that works for you by leaving a comment ๐Ÿ‘
This issue has now been marked as stale and will be closed if no further activity occurs. Thanks!

JtwoA commented

Funny the bot bumped this. Updated to HA 12.3 last night and my ZWaveJS lost 13/15 devices. My other automation system initiated a ping to no avail. Manually restarting ZWaveJS brought all but one back. I got that one back this morning by manually intervening.

There hasn't been any activity on this issue recently, so we clean up some of the older and inactive issues.
Please make sure to update to the latest version and check if that solves the issue. Let us know if that works for you by leaving a comment ๐Ÿ‘
This issue has now been marked as stale and will be closed if no further activity occurs. Thanks!