openhab/org.openhab.binding.zigbee

Zigbee devices going offline with zigbee-addon 3.2.0

nailyk-fr opened this issue ยท 11 comments

Description

Since some time I am using Openhab to control lights in my place.
I migrated (new installation) from Openhab 2.x to Openhab 3.x (zigbee-addon 3.1.0) with quite success. However, my VM was to slow to handle my setup.
Decided to buy a dedicated machine for Openhab and started a fresh setup with OH3.2.0 (zigbee-addon 3.2.0)
Since, it is a nightmare.

Some devices (like IKEA bulbs) do not work at all. After pairing (and when controller leave pairing mode), devices appears offline.
I thought it came from the controller so I bought a BitronVideo BV2010/10 on top of the Elelabs one. Behavior was mostly the same.
Decided to upgrade the firmware of the controller (the BV one) and then faced another issue : #726
I moved back to the Elelabs one and decided to replace my bulbs.

Everything was working quite well for some days and, suddenly, all the devices have the same behavior : going offline.

I tried as much as I can to reproduce this with my old VM (still with addon 3.1.0) but this 'going offline' behavior does not seems to occur.

After a restart of the host, I can usually send one command before the device goes offline:

Before restart :

2022-03-02 17:05:48.473 [INFO ] [openhab.event.ItemCommandEvent      ] - Item 'LIGHT2ndBedRTZ3000oborybowTS0502A_Color' received command 39

==> /var/log/openhab/openhab.log <==
2022-03-02 17:05:48.474 [DEBUG] [ng.zigbee.handler.ZigBeeThingHandler] - 60A423FFFE92CFCB: Command for channel zigbee:device:6fbabd96fd:60a423fffe92cfcb:60A423FFFE92CFCB_1_color --> 39 [PercentType]
2022-03-02 17:05:48.476 [DEBUG] [transaction.ZigBeeTransactionManager] - 60A423FFFE92CFCB: Creating new Transaction Queue
2022-03-02 17:05:48.479 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 60A423FFFE92CFCB: Set profile to ZigBeeTransactionProfile [maxOutstandingTransactions=1, interTransactionDelay=50, maxRetries=2]
2022-03-02 17:05:48.480 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction Manager is shutdown. Transaction not sent: ZigBeeTransaction [ieeeAddress=null queueTime=-, state=WAITING, sendCnt=0, command=MoveToLevelWithOnOffCommand [Level Control: 0000/0 -> 5581/1, cluster=0008, TID=--, level=99, transitionTime=10]]
2022-03-02 17:05:48.482 [DEBUG] [converter.ZigBeeBaseChannelConverter] - 60A423FFFE92CFCB: Channel zigbee:device:6fbabd96fd:60a423fffe92cfcb:60A423FFFE92CFCB_1_color waiting for response to 39
2022-03-02 17:05:48.483 [DEBUG] [converter.ZigBeeBaseChannelConverter] - 60A423FFFE92CFCB: Channel zigbee:device:6fbabd96fd:60a423fffe92cfcb:60A423FFFE92CFCB_1_color received SUCCESS in response to 39
2022-03-02 17:05:48.484 [DEBUG] [converter.ZigBeeBaseChannelConverter] - 60A423FFFE92CFCB: Channel zigbee:device:6fbabd96fd:60a423fffe92cfcb:60A423FFFE92CFCB_1_color updated to 39
2022-03-02 17:05:48.486 [DEBUG] [ng.zigbee.handler.ZigBeeThingHandler] - 60A423FFFE92CFCB: Updating ZigBee channel state zigbee:device:6fbabd96fd:60a423fffe92cfcb:60A423FFFE92CFCB_1_color to 39

==> /var/log/openhab/events.log <==
2022-03-02 17:05:48.491 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'LIGHT2ndBedRTZ3000oborybowTS0502A_Color' changed from UNDEF to 39

==> /var/log/openhab/openhab.log <==
2022-03-02 17:05:48.493 [DEBUG] [.zigbee.handler.ZigBeeIsAliveTracker] - IsAlive Tracker reset for handler with thingUID=zigbee:device:6fbabd96fd:60a423fffe92cfcb
2022-03-02 17:05:48.498 [DEBUG] [.zigbee.handler.ZigBeeIsAliveTracker] - IsAlive Tracker cancelled task for thingUID=zigbee:device:6fbabd96fd:60a423fffe92cfcb
2022-03-02 17:05:48.500 [DEBUG] [.zigbee.handler.ZigBeeIsAliveTracker] - IsAlive Tracker scheduled task for thingUID=zigbee:device:6fbabd96fd:60a423fffe92cfcb in 14430 seconds

After restart :

image

After sending a command to zigbee:device:6fbabd96fd:60a423fffe92cfcb :

==> /var/log/openhab/events.log <==
2022-03-02 17:31:59.881 [INFO ] [openhab.event.ItemCommandEvent      ] - Item 'LIGHT2ndBedRTZ3000oborybowTS0502A_Color' received command 40

==> /var/log/openhab/openhab.log <==
2022-03-02 17:31:59.891 [DEBUG] [ng.zigbee.handler.ZigBeeThingHandler] - 60A423FFFE92CFCB: Command for channel zigbee:device:6fbabd96fd:60a423fffe92cfcb:60A423FFFE92CFCB_1_color --> 40 [PercentType]
2022-03-02 17:31:59.894 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 60A423FFFE92CFCB: Added transaction to queue, len=1, transaction=ZigBeeTransaction [ieeeAddress=60A423FFFE92CFCB queueTime=0, state=WAITING, sendCnt=0, command=MoveToLevelWithOnOffCommand [Level Control: 0000/0 -> 5581/1, cluster=0008, TID=--, level=102, transitionTime=10]]
2022-03-02 17:31:59.895 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction Manager: Send Next transaction. outstandingTransactions=1, outstandingQueues=2, sleepy=0/3
2022-03-02 17:31:59.896 [DEBUG] [transaction.ZigBeeTransactionManager] - 5581/1: Sending ZigBeeTransaction [ieeeAddress=60A423FFFE92CFCB queueTime=2, state=WAITING, sendCnt=0, command=MoveToLevelWithOnOffCommand [Level Control: 0000/0 -> 5581/1, cluster=0008, TID=65, level=102, transitionTime=10]]
2022-03-02 17:31:59.897 [DEBUG] [transaction.ZigBeeTransactionManager] - transactionListenerAdded: 2 outstanding
2022-03-02 17:31:59.897 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - TX CMD: MoveToLevelWithOnOffCommand [Level Control: 0000/0 -> 5581/1, cluster=0008, TID=65, level=102, transitionTime=10]
2022-03-02 17:31:59.898 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - TX ZCL: ZclHeader [frameType=CLUSTER_SPECIFIC_COMMAND, manufacturerSpecific=false, direction=CLIENT_TO_SERVER, disableDefaultResponse=false, manufacturerCode=0, sequenceNumber=101, commandId=4]
2022-03-02 17:31:59.899 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - TX APS: ZigBeeApsFrame [sourceAddress=0000/1, destinationAddress=5581/1, profile=0104, cluster=0008, addressMode=DEVICE, radius=8, apsSecurity=false, ackRequest=true, apsCounter=C5, rssi=--, lqi=--, payload=01 65 04 66 0A 00]
2022-03-02 17:31:59.899 [DEBUG] [converter.ZigBeeBaseChannelConverter] - 60A423FFFE92CFCB: Channel zigbee:device:6fbabd96fd:60a423fffe92cfcb:60A423FFFE92CFCB_1_color waiting for response to 40
2022-03-02 17:31:59.906 [DEBUG] [zigbee.dongle.ember.ZigBeeDongleEzsp] - RX EZSP: EzspSendUnicastResponse [networkId=0, status=EMBER_NO_BUFFERS, sequence=00]
2022-03-02 17:31:59.906 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - RX STA: msgTag=65 state=TX_NAK
2022-03-02 17:31:59.907 [DEBUG] [transaction.ZigBeeTransactionManager] - notifyTransactionProgress: TID=65, state=TX_NAK, outstanding=2
2022-03-02 17:31:59.908 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=60A423FFFE92CFCB queueTime=13, state=FAILED, sendCnt=1, command=MoveToLevelWithOnOffCommand [Level Control: 0000/0 -> 5581/1, cluster=0008, TID=65, level=102, transitionTime=10]]
2022-03-02 17:31:59.908 [DEBUG] [transaction.ZigBeeTransactionManager] - transactionListenerRemoved: 1 outstanding
2022-03-02 17:31:59.908 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 60A423FFFE92CFCB: transactionComplete, state=FAILED, outstanding=0
2022-03-02 17:31:59.909 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 60A423FFFE92CFCB: Added transaction to queue, len=1, transaction=ZigBeeTransaction [ieeeAddress=60A423FFFE92CFCB queueTime=15, state=WAITING, sendCnt=1, command=MoveToLevelWithOnOffCommand [Level Control: 0000/0 -> 5581/1, cluster=0008, TID=65, level=102, transitionTime=10]]
2022-03-02 17:31:59.909 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction Manager: Send Next transaction. outstandingTransactions=1, outstandingQueues=2, sleepy=0/3
2022-03-02 17:31:59.909 [DEBUG] [zigbee.transaction.ZigBeeTransaction] - Transaction state changed: nwk=5581, TID=65, event=TX_NAK, state=WAITING
2022-03-02 17:31:59.947 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction Manager: Send Next transaction. outstandingTransactions=1, outstandingQueues=2, sleepy=0/3
2022-03-02 17:31:59.948 [DEBUG] [transaction.ZigBeeTransactionManager] - 5581/1: Sending ZigBeeTransaction [ieeeAddress=60A423FFFE92CFCB queueTime=53, state=WAITING, sendCnt=1, command=MoveToLevelWithOnOffCommand [Level Control: 0000/0 -> 5581/1, cluster=0008, TID=65, level=102, transitionTime=10]]
2022-03-02 17:31:59.948 [DEBUG] [transaction.ZigBeeTransactionManager] - transactionListenerAdded: 2 outstanding
2022-03-02 17:31:59.949 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - TX CMD: MoveToLevelWithOnOffCommand [Level Control: 0000/0 -> 5581/1, cluster=0008, TID=65, level=102, transitionTime=10]
2022-03-02 17:31:59.949 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - TX ZCL: ZclHeader [frameType=CLUSTER_SPECIFIC_COMMAND, manufacturerSpecific=false, direction=CLIENT_TO_SERVER, disableDefaultResponse=false, manufacturerCode=0, sequenceNumber=101, commandId=4]
2022-03-02 17:31:59.950 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - TX APS: ZigBeeApsFrame [sourceAddress=0000/1, destinationAddress=5581/1, profile=0104, cluster=0008, addressMode=DEVICE, radius=8, apsSecurity=false, ackRequest=true, apsCounter=C6, rssi=--, lqi=--, payload=01 65 04 66 0A 00]
2022-03-02 17:31:59.958 [DEBUG] [zigbee.dongle.ember.ZigBeeDongleEzsp] - RX EZSP: EzspSendUnicastResponse [networkId=0, status=EMBER_NO_BUFFERS, sequence=00]
2022-03-02 17:31:59.958 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - RX STA: msgTag=65 state=TX_NAK
2022-03-02 17:31:59.959 [DEBUG] [transaction.ZigBeeTransactionManager] - notifyTransactionProgress: TID=65, state=TX_NAK, outstanding=2
2022-03-02 17:31:59.960 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=60A423FFFE92CFCB queueTime=66, state=FAILED, sendCnt=2, command=MoveToLevelWithOnOffCommand [Level Control: 0000/0 -> 5581/1, cluster=0008, TID=65, level=102, transitionTime=10]]
2022-03-02 17:31:59.961 [DEBUG] [transaction.ZigBeeTransactionManager] - transactionListenerRemoved: 1 outstanding
2022-03-02 17:31:59.961 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 60A423FFFE92CFCB: transactionComplete, state=FAILED, outstanding=0
2022-03-02 17:31:59.962 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 60A423FFFE92CFCB: transactionComplete exceeded max retries 2
2022-03-02 17:31:59.962 [DEBUG] [zigbee.transaction.ZigBeeTransaction] - Transaction cancelled: ZigBeeTransaction [ieeeAddress=60A423FFFE92CFCB queueTime=68, state=CANCELLED, sendCnt=2, command=MoveToLevelWithOnOffCommand [Level Control: 0000/0 -> 5581/1, cluster=0008, TID=65, level=102, transitionTime=10]]
2022-03-02 17:31:59.963 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=60A423FFFE92CFCB queueTime=69, state=CANCELLED, sendCnt=2, command=MoveToLevelWithOnOffCommand [Level Control: 0000/0 -> 5581/1, cluster=0008, TID=65, level=102, transitionTime=10]]
2022-03-02 17:31:59.963 [DEBUG] [converter.ZigBeeBaseChannelConverter] - 60A423FFFE92CFCB: Channel zigbee:device:6fbabd96fd:60a423fffe92cfcb:60A423FFFE92CFCB_1_color received TIMEOUT in response to 40
2022-03-02 17:31:59.964 [DEBUG] [transaction.ZigBeeTransactionManager] - transactionListenerRemoved: 1 outstanding
2022-03-02 17:31:59.964 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 60A423FFFE92CFCB: transactionComplete but not outstanding, state=CANCELLED, outstanding=0
2022-03-02 17:31:59.965 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction Manager: Send Next transaction. outstandingTransactions=1, outstandingQueues=1, sleepy=0/3
2022-03-02 17:31:59.965 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction Manager: Send Next transaction. outstandingTransactions=1, outstandingQueues=1, sleepy=0/3
 
==> /var/log/openhab/events.log <==
2022-03-02 17:31:59.966 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zigbee:device:6fbabd96fd:60a423fffe92cfcb' changed from ONLINE to OFFLINE
 
==> /var/log/openhab/openhab.log <==
2022-03-02 17:31:59.970 [DEBUG] [zigbee.transaction.ZigBeeTransaction] - Transaction state changed: nwk=5581, TID=65, event=TX_NAK, state=CANCELLED
2022-03-02 17:32:00.152 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction Manager: Send Next transaction. outstandingTransactions=1, outstandingQueues=1, sleepy=0/3
2022-03-02 17:32:00.153 [DEBUG] [transaction.ZigBeeTransactionManager] - FFFF/0: Sending ZigBeeTransaction [ieeeAddress=null queueTime=10903, state=WAITING, sendCnt=0, command=NetworkAddressRequest [0000/0 -> FFFF/0, cluster=0000, TID=66, ieeeAddr=84182600000F21CA, requestType=0, startIndex=0]]
2022-03-02 17:32:00.153 [DEBUG] [transaction.ZigBeeTransactionManager] - transactionListenerAdded: 2 outstanding
2022-03-02 17:32:00.154 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - TX CMD: NetworkAddressRequest [0000/0 -> FFFF/0, cluster=0000, TID=66, ieeeAddr=84182600000F21CA, requestType=0, startIndex=0]
2022-03-02 17:32:00.154 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - TX APS: ZigBeeApsFrame [sourceAddress=0000/0, destinationAddress=FFFF/0, profile=0000, cluster=0000, addressMode=DEVICE, radius=8, apsSecurity=false, ackRequest=true, apsCounter=C7, rssi=--, lqi=--, payload=66 CA 21 0F 00 00 26 18 84 00 00]
2022-03-02 17:32:00.163 [DEBUG] [zigbee.dongle.ember.ZigBeeDongleEzsp] - RX EZSP: EzspSendBroadcastResponse [networkId=0, status=EMBER_NO_BUFFERS, sequence=00]
2022-03-02 17:32:00.163 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - RX STA: msgTag=66 state=TX_NAK
2022-03-02 17:32:00.164 [DEBUG] [transaction.ZigBeeTransactionManager] - notifyTransactionProgress: TID=66, state=TX_NAK, outstanding=2
2022-03-02 17:32:00.165 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=null queueTime=10915, state=FAILED, sendCnt=1, command=NetworkAddressRequest [0000/0 -> FFFF/0, cluster=0000, TID=66, ieeeAddr=84182600000F21CA, requestType=0, startIndex=0]]
2022-03-02 17:32:00.165 [DEBUG] [transaction.ZigBeeTransactionManager] - transactionListenerRemoved: 1 outstanding
2022-03-02 17:32:00.166 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - Broadcast: transactionComplete, state=FAILED, outstanding=1
2022-03-02 17:32:00.166 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - Broadcast: transactionComplete exceeded max retries 1
2022-03-02 17:32:00.167 [DEBUG] [zigbee.transaction.ZigBeeTransaction] - Transaction cancelled: ZigBeeTransaction [ieeeAddress=null queueTime=10918, state=CANCELLED, sendCnt=1, command=NetworkAddressRequest [0000/0 -> FFFF/0, cluster=0000, TID=66, ieeeAddr=84182600000F21CA, requestType=0, startIndex=0]]
2022-03-02 17:32:00.168 [DEBUG] [iscovery.ZigBeeNodeServiceDiscoverer] - 84182600000F21CA: Node SVC Discovery: NetworkAddressRequest returned null
2022-03-02 17:32:00.168 [DEBUG] [iscovery.ZigBeeNodeServiceDiscoverer] - 84182600000F21CA: Node SVC Discovery: NetworkAddressRequest failed after unicast and broadcast
2022-03-02 17:32:00.169 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=null queueTime=10919, state=CANCELLED, sendCnt=1, command=NetworkAddressRequest [0000/0 -> FFFF/0, cluster=0000, TID=66, ieeeAddr=84182600000F21CA, requestType=0, startIndex=0]]
2022-03-02 17:32:00.169 [DEBUG] [transaction.ZigBeeTransactionManager] - transactionListenerRemoved: 1 outstanding
2022-03-02 17:32:00.169 [DEBUG] [iscovery.ZigBeeNodeServiceDiscoverer] - 84182600000F21CA: Node SVC Discovery: request NWK_ADDRESS failed. Retry 8, wait 19323ms before retry.
2022-03-02 17:32:00.170 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - Broadcast: transactionComplete but not outstanding, state=CANCELLED, outstanding=1
2022-03-02 17:32:00.171 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction Manager: Send Next transaction. outstandingTransactions=1, outstandingQueues=1, sleepy=0/3
2022-03-02 17:32:00.171 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction Manager: Send Next transaction. outstandingTransactions=1, outstandingQueues=1, sleepy=0/3
2022-03-02 17:32:00.172 [DEBUG] [zigbee.transaction.ZigBeeTransaction] - Transaction state changed: nwk=FFFF, TID=66, event=TX_NAK, state=CANCELLED
2022-03-02 17:32:02.820 [DEBUG] [iscovery.ZigBeeNodeServiceDiscoverer] - 00124B0022FE61AE: Node SVC Discovery: running NWK_ADDRESS
2022-03-02 17:32:02.821 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 00124B0022FE61AE: Added transaction to queue, len=1, transaction=ZigBeeTransaction [ieeeAddress=00124B0022FE61AE queueTime=0, state=WAITING, sendCnt=0, command=NetworkAddressRequest [0000/0 -> 1C50/0, cluster=0000, TID=--, ieeeAddr=00124B0022FE61AE, requestType=0, startIndex=0]]
2022-03-02 17:32:02.822 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction Manager: Send Next transaction. outstandingTransactions=1, outstandingQueues=2, sleepy=0/3
2022-03-02 17:32:02.823 [DEBUG] [transaction.ZigBeeTransactionManager] - 1C50/0: Sending ZigBeeTransaction [ieeeAddress=00124B0022FE61AE queueTime=1, state=WAITING, sendCnt=0, command=NetworkAddressRequest [0000/0 -> 1C50/0, cluster=0000, TID=67, ieeeAddr=00124B0022FE61AE, requestType=0, startIndex=0]]
2022-03-02 17:32:02.823 [DEBUG] [transaction.ZigBeeTransactionManager] - transactionListenerAdded: 2 outstanding

image

Configuration

Configuration Description
Coordinator used Elelabs (Dongle firmware version is currently 6.0.3.0)
openHAB version 3.2.0
Hardware 4x Intel(R) Celeron(R) CPU J3455 @ 1.50GHz
Memory 8 Gb
Java version Intel(R) Celeron(R) CPU J3455 @ 1.50GHz
Devices What to provide here ?

How to help investigating &| solving this issue ? This is being quite annoying as I started to remove some electrical switches from the wall to replace them with zigbee ones....

I know this is a bit like Vanilla Icecream issue... Sorry for this....

P.S.: If I manage to get a working firmware on the BitronVideo controller I will try to move devices one by one until, maybe, finding a faulty one.
P.S.2: Most of the devices paired with this Openhab 3.2.0 setup where paired with the previous 3.1.0 one.

Firtstly, I would suggest that firmware 6.0 will not work with most new devices - so I'm surprised this really works at all. You need FW 6.4 or newer to work with most ZB3 devices.

However, that's not probably the issue here - there seems to be a memory problem in the NCP as I can see it is out of buffers (status=EMBER_NO_BUFFERS). I'm not really sure what will cause this - maybe you have set some of the configuration to use very large network which will consume memory in the NCP and I'd suggest to check your configuration and reduce the network size.

None of this will be related to the binding though, so if you use the 3.1 settings it should be fine (it hasn't changed for a long long time).

Thanks a lot for your fast answer !

Indeed, the controller is setup on 'High Ram'. Here is the controller configuration (I carefully reported all the configuration between the old setup and the new one).

UID: zigbee:coordinator_ember:6fbabd96fd
label: Coordinator - Ember Coordinator - Elelabs
thingTypeUID: zigbee:coordinator_ember
configuration:
  zigbee_port: /dev/serial/by-id/usb-1a86_USB2.0-Serial-if00-port0
  zigbee_channel: 25
  zigbee_initialise: false
  zigbee_concentrator: 1
  zigbee_trustcentremode: TC_JOIN_SECURE
  zigbee_extendedpanid: 8B73F7F5B45C722D
  zigbee_baud: 115200
  zigbee_flowcontrol: 1
  zigbee_panid: 35854
  zigbee_powermode: 1
  zigbee_txpower: 8
  zigbee_networksize: 100
  zigbee_linkkey: 5A6967426565416C6C69616E63653039
  zigbee_childtimeout: 172800
  zigbee_networkkey: B401068DAA25F9A54EE0F81CEE951E20
  zigbee_meshupdateperiod: 86400

For testing purposes, I can try to remove some devices but it worked perfectly for some days. Though, on my 3.1.0 setup, it was much less devices.

The number of devices joined to the network will have no impact on memory use in the coordinator so removing devices will likely not help. Memory is allocated based on the coordinator configuration and not when devices are added to the network.

I would strongly suggest to reduce the network size from 100 to something smaller - this will allocate a lot of memory, and I suspect that your device doesn't have the capability to support this.

I guess based on your comments elsewhere that this is resolved?

As noted, this issue is documented in the README - it states that if you set this number too high, it will impact on the number of packet buffers which is exactly the error in your logs. The error in your logs is status=EMBER_NO_BUFFERS.

Indeed !
So far, everything is working properly.

However, I did basic debugging mistake... I changed the network size & upgraded the key at the same time. So I am now unsure what was the real fix.

My guests went out today so I can break the system again.
Next step is to increase network size again and wait couple of weeks if the problem will come back.

Maybe the README can be adapted this way ?
To add in the zigbee_networksize section this it can cause status=EMBER_NO_BUFFERS error if set too big ?

However, I did basic debugging mistake... I changed the network size & upgraded the key at the same time. So I am now unsure what was the real fix.

For sure - your problem was the network size. When you increase the network size, the binding will make the tables larger, this in term uses more memory, and in turn this causes the error that we saw in your logs. Changing the network key will not change memory allocation.

Maybe the README can be adapted this way ?

Yep - I'm happy with adding something along these lines. I assume you are offering to create a PR :) Thanks.

Do you have any logs from your binding startup during this error? If so can you provide them - I want to see if it's possible to detect this problem and raise a more useful message during the startup. This should be possible, but I want to see if in your situation I see the responses that I expect in this case.

For sure - your problem was the network size. When you increase the network size, the binding will make the tables larger, this in term uses more memory, and in turn this causes the error that we saw in your logs. Changing the network key will not change memory allocation.

Sorry, I meant I upgraded the firmware of the key and change the network size to 25 at the same time.

I do not store the logs. As it is quite verbose, they are on a tmpfs...
Tho, if the network key increase produce the error again, I will surely provide a log !

Sorry, I meant I upgraded the firmware of the key and change the network size to 25 at the same time.

Ah - ok. I still think that the fix will be the network size. As far as I know the firmware created by ELE labs doesn't use all the available memory, although this might have been fixed in more recent versions.

I believe this issue is resolved as per the discussion above.

Hello @cdjackson

Sorry for super late answer, I was thinking about updating today ๐Ÿ˜‰
For some weeks (month?) now, I switched back the Network size setting to 'Enormous' in order to reproduce the issue :
image

And, so far, no issue at all. Once, after some people pushed a lot the buttons on the Philips remote, we experienced a huge delay. But, I believe, it was because the service was still processing all the button push.

So, it seems to me, the main issue was the firmware upgrade of the key. Is there any way to downgrade the firmware to prove this theory ?

I don't know what firmware you are now using, but I'm very confident that your original issue was the network size - this was very clear from the logs you provided.

New firmware may also have allocated more memory, so this may also now allow you to use a larger network, but I really can't say without knowing exactly how the firmware is compiled.