openhab/org.openhab.binding.zigbee

Secondary Level Control channels have stopped working on OH3

weakfl opened this issue · 16 comments

Outline

The secondary dimmer channels of my Busch-Jaeger 6736 control elements have stopped working since I've migrated to OH3. They did work fine with OH2.

OH3 shows all channels, but I'm not receiving updates when pressing buttons on the control element.

  • Device address: d85def11a1003142
  • Channel: D85DEF11A1003142_11_dimmer

I've attached an OH log as well as a sniffer log below. Before I've started logging I've removed the device from OH3 and started by pairing it to capture the setup too.

Configuration

Configuration Description
Coordinator used Ember EM35x
openHAB version 3.2.0 (#2473)
Hardware Raspberry Pi 3 Model B Rev 1.2
Java version openjdk version "11.0.12" 2021-07-20 LTS

Logs

openhab_partial.log.zip
openhab_sniffed.pcapng.zip

@cdjackson Just in case it might help, current/old device XMLs:

D85DEF11A1003142.xml.zip
D85DEF11A1003142_OLD.xml.zip

One thing I've noticed, in the old XML endpoint 11 has entries for ON_OFF and LEVEL_CONTROL cluster attributes, in the new XML only endpoint 18 has On Off and Current Level attributes. But it looks like the format has changed quite a lot so I don't know if it's noteworthy...

it looks like the format has changed quite a lot

The format used by the binding hasn't changed for a long long time. I also don't think there has been a change to the binding, and if there are features missing in the XML, then possibly the discovery had a problem and I'd suggest to reinitialise the device so that it is rediscovered.

My guess is that something went wrong with the discovery - that's all. The discovery log itself shows that the features are there, but they aren't stored in the XML (not sure why and it's impossible to know the history of this file) so the binding will not discover them.

The format used by the binding hasn't changed for a long long time.

I looked at a more recent backup of the XMLs and tey look indeed pretty similar.

I also don't think there has been a change to the binding, and if there are features missing in the XML, then possibly the discovery had a problem and I'd suggest to reinitialise the device so that it is rediscovered.

I have two of these devices, I've removed and re-added and reinitialised them a couple of times but unfortunately that didn't help.

The following errors are shown in the openHAB log on reinitialisation:

D85DEF11A1003142: Error 0x8c setting client binding
D85DEF11A1003142: Error 0x8c setting client binding
D85DEF11A1001973: Error 0x8c setting client binding
D85DEF11A1001973: Error 0x8c setting client binding

My guess is that something went wrong with the discovery - that's all. The discovery log itself shows that the features are there, but they aren't stored in the XML (not sure why and it's impossible to know the history of this file) so the binding will not discover them.

All I can say is that the devices worked fine with OH2.

The binding errors won't be the cause of the channels not showing up. If the channels were there, then the binding error means that the device has not been configured to send report since it already has a full binding table. You can't clear the binding table (zigbee doesn't provide this option) so you can either try and remove a binding entry, or resetting the device.

As I said though, this will not stop it finding the channels or storing the correct XML (which actually isn't even something the binding does - that functionality is handled by the low level libraries).

I'm not sure I'm following, sorry. The channels are there, they're just not working:

Screenshot 2021-09-13 at 09 49 25

Screenshot 2021-09-13 at 09 51 45

Sorry - I misunderstood given that they are not showing in the XML. In that case you might just need to reset the device.

I already did reset the device(s) a couple of times and just did it again. Unfortunately the result is always the same:

10:43:56.655 [INFO ] [hab.event.ThingStatusInfoChangedEvent] - Thing 'zigbee:device:c3cd543626:d85def11a1003142' changed from ONLINE to REMOVING
10:43:56.770 [ERROR] [nding.zigbee.internal.ZigBeeDataStore] - D85DEF11A1003142: Error removing network state
10:43:56.781 [INFO ] [hab.event.ThingStatusInfoChangedEvent] - Thing 'zigbee:device:c3cd543626:d85def11a1003142' changed from REMOVING to REMOVED
10:43:56.795 [INFO ] [hab.event.ThingStatusInfoChangedEvent] - Thing 'zigbee:device:c3cd543626:d85def11a1003142' changed from REMOVED to UNINITIALIZED
10:43:56.833 [INFO ] [hab.event.ThingStatusInfoChangedEvent] - Thing 'zigbee:device:c3cd543626:d85def11a1003142' changed from UNINITIALIZED to UNINITIALIZED (HANDLER_MISSING_ERROR)

10:44:42.376 [INFO ] [gbee.discovery.ZigBeeDiscoveryService] - D85DEF11A1003142: Starting ZigBee device discovery
10:44:42.386 [INFO ] [ig.discovery.internal.PersistentInbox] - Added new thing 'zigbee:device:c3cd543626:d85def11a1003142' to inbox.
10:44:42.387 [INFO ] [openhab.event.InboxAddedEvent        ] - Discovery Result with UID 'zigbee:device:c3cd543626:d85def11a1003142' has been added.
10:44:43.534 [INFO ] [gbee.discovery.ZigBeeDiscoveryService] - D85DEF11A1003142: Starting ZigBee device discovery
10:44:51.039 [INFO ] [openhab.event.InboxRemovedEvent      ] - Discovery Result with UID 'zigbee:device:c3cd543626:d85def11a1003142' has been removed.
10:44:51.062 [INFO ] [hab.event.ThingStatusInfoChangedEvent] - Thing 'zigbee:device:c3cd543626:d85def11a1003142' changed from UNINITIALIZED to INITIALIZING
10:44:51.085 [INFO ] [hab.event.ThingStatusInfoChangedEvent] - Thing 'zigbee:device:c3cd543626:d85def11a1003142' changed from INITIALIZING to UNKNOWN
10:44:51.610 [INFO ] [openhab.event.FirmwareStatusInfoEvent] - Firmware status of thing zigbee:device:c3cd543626:d85def11a1003142 changed to UNKNOWN.
10:45:02.795 [ERROR] [.converter.ZigBeeConverterSwitchLevel] - D85DEF11A1003142: Error 0x8c setting client binding
10:45:02.889 [ERROR] [.converter.ZigBeeConverterSwitchLevel] - D85DEF11A1003142: Error 0x8c setting client binding
10:45:02.957 [ERROR] [.converter.ZigBeeConverterSwitchLevel] - D85DEF11A1003142: Error 0x8c setting client binding
10:45:03.026 [ERROR] [.converter.ZigBeeConverterSwitchLevel] - D85DEF11A1003142: Error 0x8c setting client binding
10:45:03.522 [INFO ] [hab.event.ThingStatusInfoChangedEvent] - Thing 'zigbee:device:c3cd543626:d85def11a1003142' changed from UNKNOWN to ONLINE

Not sure what else I could do except going back to OH2.

Ok, then I'm at a loss as to why the binding table is full - sorry. This has not changed so there ust be something with your setup I think.

Ok, then I'm at a loss as to why the binding table is full

Are you sure the reason is that the table is full? Couldn't it just fail to create the binding?

I took a look at PR #104, which was when you added support for the secondary channels of the BJ elements back in 2018.

And looking at the current code this seems to be the error message I'm seeing in the logs:

CommandResult bindResponse = bind(clusterOnOff).get();
if (!bindResponse.isSuccess()) {
logger.error("{}: Error 0x{} setting client binding", endpoint.getIeeeAddress(),
Integer.toHexString(bindResponse.getStatusCode()));
}

  • sorry.

Don't be sorry, I'm really grateful that you're taking the time to look into the issue!

Are you sure the reason is that the table is full? Couldn't it just fail to create the binding?

I'm reasonably sure going by the response from the device which says the table is full.

And looking at the current code this seems to be the error message I'm seeing in the logs:

Yes, that's the error from the binding - this code is standard for all converters and hasn't changed. The low level error in the packet from the device is just before that -:

2021-09-04 13:12:43.229 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - RX CMD: BindResponse [6A4D/0 -> 0000/0, cluster=8021, TID=F8, status=TABLE_FULL]
2021-09-04 13:12:43.234 [DEBUG] [transaction.ZigBeeTransactionManager] - notifyTransactionCommand: BindResponse [6A4D/0 -> 0000/0, cluster=8021, TID=F8, status=TABLE_FULL]
2021-09-04 13:12:43.240 [ERROR] [converter.ZigBeeConverterSwitchLevel] - D85DEF11A1003142: Error 0x8c setting client binding

This is because 0x8c is table full ->

    TABLE_FULL(0x8C, "The device does not have table space to support the operation."),

So this ought to be exactly the same as OH2, and I would have expected the same error in OH2.

One possibility is that this device reports this error even if the entry is set (since it will also be set in another converter). I've not come across this before, but who knows...

You could try manually setting the reporting on the console with the subscribe ENDPOINT CLUSTER ATTRIBUTE MIN-INTERVAL MAX-INTERVAL REPORTABLE-CHANGE command. I think you've used the console in the past so hopefully you're familiar with this?

Let me know if that works.

I think you've used the console in the past so hopefully you're familiar with this?

Yes I'm somewhat familiar with the console, but I'm not sure I got the syntax right. Should it look like this?
subscribe 37485/11 0008 0 0 600 1

Here are some device infos from the console:

node D85DEF11A1003142
IEEE Address     : D85DEF11A1003142
Network Address  : 37485
Node Descriptor  : NodeDescriptor [apsFlags=0, bufferSize=89, complexDescriptorAvailable=false, manufacturerCode=4398, logicalType=ROUTER, serverCapabilities=[], incomingTransferSize=61, outgoingTransferSize=61, userDescriptorAvailable=true, frequencyBands=[FREQ_2400_MHZ], macCapabilities=[MAINS_POWER, RECEIVER_ON_WHEN_IDLE, FULL_FUNCTION_DEVICE], extendedEndpointListAvailable=false, extendedSimpleDescriptorListAvailable=false, stackCompliance=0]
Power Descriptor : RECEIVER_ON_IDLE, [DISPOSABLE_BATTERY, MAINS, RECHARGABLE_BATTERY], MAINS, FULL
Associations     : []
Endpoints        : 
            10   : Profile     C05E ZIGBEE_LIGHT_LINK
                 : Device Type 0000 ON_OFF_SWITCH
                   -> BASIC
                   -> TOUCHLINK
                   <- TOUCHLINK
                   <- COLOR_CONTROL
                   <- IDENTIFY
                   <- GROUPS
                   <- SCENES
                   <- ON_OFF
                   <- LEVEL_CONTROL
                   <- OTA_UPGRADE
            11   : Profile     C05E ZIGBEE_LIGHT_LINK
                 : Device Type 0000 ON_OFF_SWITCH
                   -> BASIC
                   -> TOUCHLINK
                   <- TOUCHLINK
                   <- COLOR_CONTROL
                   <- IDENTIFY
                   <- GROUPS
                   <- SCENES
                   <- ON_OFF
                   <- LEVEL_CONTROL
            18   : Profile     C05E ZIGBEE_LIGHT_LINK
                 : Device Type 0000 ON_OFF_SWITCH
                   -> BASIC
                   -> IDENTIFY
                   -> GROUPS
                   -> SCENES
                   -> ON_OFF
                   -> LEVEL_CONTROL
> bindtable D85DEF11A1003142
Binding table for node 37485 [D85DEF11A1003142]
Src Address          | Dest Address         | Group | Mode    | Cluster
D85DEF11A1003142/18  | 0022A300002921FD/1   |       | Address | 0006:ON_OFF
D85DEF11A1003142/10  | 0022A300002921FD/1   |       | Address | 0008:LEVEL_CONTROL
D85DEF11A1003142/10  | 0022A300002921FD/1   |       | Address | 0006:ON_OFF
D85DEF11A1003142/18  | 0022A300002921FD/1   |       | Address | 0008:LEVEL_CONTROL

Thanks.

So it's interesting. Presumably the table size is only 4 entries, so it's full. However there is duplication - in the past when I've looked at this, devices have consolidated entries - so if you set an entry that already exists, it doesn't get added twice. Here, entries are duplicated, and it doesn't have enough space to accept the new entry on endpoint 11.

This is something I might need to take a look at, but it won't be a 5 minute fix by any means. All converters are reasonably self contained, and there's currently no checking to see if the tables already contain entries, and a number of devices (well, older devices) don't allow the binding table to be read back which makes it problematic in any event.

So, I suggest that you manipulate the tables manually using the console. You know the bindtable command, and also there is bind and unbind. I think this is all you should have to do - remove the duplicate bindings, then set up the binding for endpoint 11 ON_OFF and LEVEL_CONTROL.

@cdjackson First of all, thanks a lot, manually setting up the binding table fixed the issue 🎉

However, I have a followup question if you don't mind:

Is it mandatory to bind both, the ON_OFF and LEVEL_CONTROL cluster of an endpoint, or could I just bind ON_OFF as well?

I'm asking because previously the endpoint _10 wasn't working and I'm now assuming that the underlying cause was also that the binding table was full. I'm guessing that previously endpoint _11 was bound first and I didn't pay much attention to endpoint _10 because it's more or less the same as endpoint _18.

Background:

The device consists of two physical components. A dimmer insert (endpoint _18) and a control element with two rockers: upper rocker is presumably endpoint _10 (which also operates the dimmer insert) and endpoint _11 is the lower rocker.

The insert isn't reporting state changes, so I've resorted to setting the minimum reporting interval to 10 seconds to get timely UI updates. If I could get endpoints _10 and _11 working, I could work around the issue with some OH rules and wouldn't have to change the minimum reporting interval.

Glad this is fixed :)

Is it mandatory to bind both, the ON_OFF and LEVEL_CONTROL cluster of an endpoint, or could I just bind ON_OFF as well?

I guess this depends on what commands the device is sending. The binding will process both on_off and level_control commands coming from the device, but maybe you don't need the on_off commands. You are clearly competent with using the console, so I would suggest to give it a go and see what happens - ie remove the on_off binding, and see if you still get enough commands being received to make the device useful to you. If so, you can then use the spare table entry on the other endpoint.

Please do report back though as it's always interesting to know how devices work and to have it written somewhere so when we all forget in a years time we can refer back :) .

Please do report back though as it's always interesting to know how devices work and to have it written somewhere so when we all forget in a years time we can refer back :)

I did try to bind endpoint 10, but it didn't work. Endpoint 11 was still working with just level_control bound however.

I can only guess, but I think endpoint 10 is inactive if the control element is attached to a relay or dimmer insert. Endpoint 10 is probably just active if the control element is attached to a power insert and purely used as remote control.

This is something I might need to take a look at

I'll leave it up to you to close this issue or leave it open as reference in case you want to look at consolidating entries in the binding table.

I think I will close this as the title really doesn't reflect the issue. I'll open another issue to record the binding table consolidation.