Yoda-x/ha-zha-new

Motion sensor no longer updates, with Updating zha_new zha_new took longer than the scheduled update interval 0:00:15

Closed this issue · 8 comments

kkr16 commented

After some random time, the server will stop updating the motion sensor component state and will return the below error
2018-08-02 15:58:56 WARNING (MainThread) [custom_components.zha_new] Updating zha_new zha_new took longer than the scheduled update interval 0:00:15

The error appears approximately every 15 seconds, and the sensors do not update anymore. The system does not recover from this state.

I am usually forced to reboot.

kkr16 commented

I went in debug mode, and this is what I see. It happens every 15 seconds..

2018-08-03 18:46:37 DEBUG (MainThread) [bellows.ezsp] Send command neighborCount
2018-08-03 18:46:37 DEBUG (MainThread) [bellows.uart] Sending: b'606421d244517e'
2018-08-03 18:46:37 DEBUG (MainThread) [bellows.uart] Data frame: b'0764a1d25406637e'
2018-08-03 18:46:37 DEBUG (MainThread) [bellows.uart] Sending: b'8160597e'
2018-08-03 18:46:37 DEBUG (MainThread) [bellows.ezsp] Application frame 122 (neighborCount) received
2018-08-03 18:46:37 DEBUG (MainThread) [bellows.ezsp] Send command getValue
2018-08-03 18:46:37 DEBUG (MainThread) [bellows.uart] Sending: b'7165210257beca7e'
2018-08-03 18:46:37 DEBUG (MainThread) [bellows.uart] Data frame: b'1065a102542becd3417e'
2018-08-03 18:46:37 DEBUG (MainThread) [bellows.uart] Sending: b'82503a7e'
2018-08-03 18:46:37 DEBUG (MainThread) [bellows.ezsp] Application frame 170 (getValue) received
2018-08-03 18:46:37 DEBUG (MainThread) [custom_components.zha_new] buffer: b'\xf9'

a motion sensor is not pulled for updates. only switches and bulbs are pulled. your log shows the neighbor count polling which happens every 15 second.
a) what kind of hardware you use for your ha-server?
b) what sensor you use? the newer aqara sensors have a X-attrib-10 attribute which is the parent nwk id. if you turn of the parent (Eg a bulb with the wall switch) it will show this message for the parent and the sensor can't reach the server anymore.

kkr16 commented

a) Raspberry Pi3 running Raspbian with HA in a virtual environment and a HUSBZB-1 usb dongle.

(homeassistant) pi@raspberrypi:/opt/homeassistant $ hass --version
0.74.2

These are the loaded dependecies:

(homeassistant) pi@raspberrypi:/opt/homeassistant $ pip list  --format=legacy | egrep "bellows|zigpy"
bellows (0.6.0-YD)
zigpy (0.1.1-Y)

My configuraition looks like this:

zha_new:
  usb_path: /dev/ttyUSB1
  database_path: /opt/homeassistant-config/zigbee.db

b) I have two Aqara RTCGQ11LM motion sensors I very recently bought from Ali Express - any easy way to find out if they are the newer model?

My ZigBee mesh currently consists of the HUSBZB-1 and two Aqara motion sensors.

I also have around 25 ZigBee lights (mix of Hue and Ikea), but they are linked to the my Hue Hub and are not directly part of the HA/Aqara mesh.

for the model,please go to developer tools -> states. there you see either "lumi.sensor_motion" or " lumi.sensor_motion.aq2" (new) if you have the aq2, you see also an x-attr-10 value, which shows the network id of the parent device, if you have "0", it is connected directly to controller.

Also the raspi may not the best device for usage with zha and a usb stick, as the timing for the serial communication has some strict requirements from the usb device.
I will push an update soon which handles better the communication for a raspi

kkr16 commented

Both of them are *.aq2

I think you are right concerning the timing. I've been seeing correlation with the zha_new timeouts and either events:

  1. 2018-08-11 10:01:15 ERROR (MainThread) [homeassistant.core] Timer got out of sync. Resetting
  2. Restarting docker-compose which increases CPU usage & system load

I'm planning to eventually migrate my setup to a NUC but it won't be done in the near future, based on what you're saying, that should fix it.

please pull the new release. I made some changes to the bellows lib, so it detects timing errors with the USB stick and resets the communication.
The problem with hass is, that is uses just one cpu. if possble you should assign one specific cpu to the docker container and disable usage for other processes for this cpu.

kkr16 commented

Thanks! I just pulled the latest zha_new, zigby and bellows.

(homeassistant) pi@raspberrypi:/opt/homeassistant-config $ pip list |   egrep "bellows|zigpy"
bellows                 0.7.0-Y
zigpy                   0.1.2-Y
zigpy-xbee              0.1.1

Things look good so far. I'll avoid touching my system for the day and see if it breaks again.

"The problem with hass is, that is uses just one cpu. if possble you should assign one specific cpu to the docker container and disable usage for other processes for this cpu."

You mean the Hass docker container? I'm running Home assistant in a venv because I was not able to pip install your custom bellows and zigpy in the Hass docker container. If at all possible, can you share the procedure here, or maybe on your wiki?

kkr16 commented

Patch seems to be working well after 24 hours of use. I created some high load situations on the Pi3 that would have broken it in the previous release, and bellows recovers without issues:

2018-08-15 09:02:04 ERROR (MainThread) [bellows.uart] Error (ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT), reset connection
2018-08-15 09:02:04 ERROR (MainThread) [homeassistant.core] Timer got out of sync. Resetting
2018-08-15 09:02:04 WARNING (MainThread) [homeassistant.helpers.entity] Update of zha_new.controller is taking over 10 seconds
2018-08-15 09:02:04 WARNING (MainThread) [custom_components.zha_new] Updating zha_new zha_new took longer than the scheduled update interval 0:00:15
2018-08-15 09:02:05 WARNING (MainThread) [bellows.uart] Reset success

We can close this issue. Thanks!