scottyphillips/echonetlite_homeassistant

Random number of Mitsubishi aircons unable to connect with Failed setup will retry error

Closed this issue · 34 comments

I have 8 Mitsubishi split system aircons which used to all work perfectly with Home Assistant. Recently I've had several (normally 2 or 3) at a time failing to complete connection and Home Assistant reporting the Failed setup, will retry error message. These devices are all connected with good signal strength and able to be controlled via the Melview app (Australian portal). Rebooting Home Assistant sometimes fixes the issue or at least allows a different random set to connect instead whilst some are unable to connect usually 2 or 3 of the 8.
I've tried using my Unifi router to reconnect those to wifi without any results. This doesn't seem to be the issue anyway given they all remain fully functional via the official Melview app/webapp.

I've struggled to work out how to get any meaningful log info so pointers appreciated.

Assign static IP addresses in DHCP using the MAC address of each adapter.

Forgot to mention I'm already using static IP via my Dhcp server running in Home Assistant.

Roll back to version 3.7.6 and let me know if it improves things.

Yep looks like the rollback worked and all ACs loaded first time around. Ill continue to monitor and post back here if the problem persists. Thanks a bunch for your work and help!

@julianjwong thank you, lets keep an eye on it for a week and let me know.

@nao-pon i think something is not quite right with the recent update to support IP address changes. I think some devices of the same type are conflicting with UIDs on the Mitsubishi's being identical. Testing based upon identifying the UIDs may not be the best solution, if vendors are going to not provide unique values, which is what i suspect is happening here...

It is possible that this is due to the change from previously identifying devices by IP address to UID. I will re-check the UID specifications and look for ways to correctly identify the device. I'm currently on vacation, so I'll start working on it when I get home.

I just deciphered my own UID.
00 00 06 01 04 a0 c9 a0 ff fe 06 97 19 01 30 01

Manufacturer    Unknown      EUI-64 address (MAC+padding)     EOJGC+EOJCC+instance?
000006            0104           A0:C9:A0:ff:FE:06:97:19                01 30 01

These should be unique especially if they have embedded EUI-64? This is only relevant to my own mitsubishi adapter however. 🤔

Yes, If I remember correctly, if the same uidi is accessed with a different IP address, it will be treated as a change of IP address, so if multiple devices have the same uidi, only the last one will be configured.

I am thinking about this problem incorrectly. In this situation @julianjwong is using static IP addresses. The IP addresses will not be changing. Something else must be causing this issue, not UIDs.

I had a look at the changelog for 3.7.7 that may be a potential root cause for connectivity issues and the only other code that may be relevant could be this update to __init__.py. (Commits 8ee0028 and 56bf984).

        echonetlite = ECHONETConnector(instance, hass.data[DOMAIN]["api"], entry)
        try:
            await asyncio.wait_for(
                echonetlite.async_update(), timeout=60
            )  # 20 secs * retry 3 times = 60
            hass.data[DOMAIN][entry.entry_id].append(
                {"instance": instance, "echonetlite": echonetlite}
            )
        except (asyncio.TimeoutError, asyncio.CancelledError) as ex:
            raise ConfigEntryNotReady(
                f"Connection error while connecting to {host}: {ex}"
            ) from ex
        except KeyError as ex:
            raise ConfigEntryNotReady(
                f"IP address change was detected during setup of {host}"
            ) from ex

What make this relevant is that the ECHONETConnector is throttled by design. Which means that all echonetlite devices are obligated to honor the throttle. The more devices - the longer they all have to way. I know it is set fairly low at 1 second between updates, but with 8 devices at play and having to wait for response messages to polling, we may in fact be hitting the 60 second timeout configured above by the time we get to configuring those last 2-3 devices, which appears to be the fault condition we are seeing here.

    @Throttle(MIN_TIME_BETWEEN_UPDATES)
    async def async_update(self, **kwargs):
        return await self.async_update_data(kwargs=kwargs)

Let me know if I can help by testing or providing further log data, etc.

I have 5 splits the same model, 2 of another and 1 of another. So 3 different models across my 8 units

I rewrote the request timeout error handling. If timeouts are the cause of this issue, this change may fix the issue.

@julianjwong Since I changed the master branch, could you try re-downloading master using HACS and see if the problem is resolved?

If the problem persists, enable debugging on the ECHONETLite integration page, restart HA, and once HA has finished starting, disable debugging and download the log file. Please attach that log file.

Looks like one of my split systems failed to load after downloading from the master branch.
Attached the log file.
error_log-15.txt

Ok so I rolled back to 3.7.6 and it appeared to load all my split systems again. But the one that failed on the master wasn't looking right and couldn't be controlled. I deleted it thinking it was corrupted and I could add it back in. Unfortunately I can't seem to add it. Auto detect and adding directly via the IP address fails. Also tried rolling back to 3.7.5 without luck.

@julianjwong Thank you for your cooperation in testing.

When I looked at the logs, I found that the device entry was automatically deleted due to a timeout. This problem seems to require modification of the pychonet library. I'll try it, but it will take some time.

@scottyphillips I think pychonet's update() handles the timeout, but is it better to raise TimeoutError or return None?

@scottyphillips I'm thinking of removing @Throttle(MIN_TIME_BETWEEN_UPDATES) in async_update() in init.py and limiting the request interval to 1 second for each IP in echonetMessage() of echonetapiclient.py. What do you think?

    async def echonetMessage(self, host, deojgc, deojcc, deojci, esv, opc):
        no_res = True if esv is SETI else False
        payload = None
        message_array = {
            "DEOJGC": deojgc,
            "DEOJCC": deojcc,
            "DEOJCI": deojci,
            "ESV": esv,
            "OPC": opc,
        }
        if self._state.get(host) is None:
            self._state[host] = {"instances": {}}

        # Consecutive requests to the device must wait for a response
        if self._waiting.get(host) is None:
            self._waiting[host] = 0
        if self._waiting[host] > 0:
            for x in range(0, self._message_timeout):
                # Wait up to 20(0.1*200) seconds depending on the Echonet specifications.
                await asyncio.sleep(0.1)
                if not self._waiting[host]:
                    # Wait 1 sec for Echonet specifications.
                    await asyncio.sleep(0.9)
                    break
            if self._waiting[host]:
                return False

@nao-pon I think that what you propose makes sense and is reasonable, its worth a try?

I tested by removing @Throttle(MIN_TIME_BETWEEN_UPDATES), but that was a bad idea. Since update requests are made in proportion to the number of multiple instances and multiple entities, congestion can quickly occur.

I stopped doing that, but I will create a PR by adding timeout processing and available status.

I created PR...

Thanks!

If you are willing to try it out, you can try out the modified features in my echonetlite edge branch.

@nao-pon happy to test but need help on how to download your code and install over the top of my Home Assistant instance given its a different repo.

First, please save a full backup on hand so that you can restore HA in case a problem occurs.

It's easy with HACS. Register my repository as an integration in a custom repository in HACS.
240112-163443

Next, install master of ECHONETLite Platform (nao-pon@edge) and restart HA.

So it's ok if I install over the top of Scotty's one? Or do I need to uninstall his first?

There is no need to uninstall the integration. Just overwrite it. The settings will continue as they are.

@nao-pon sorry no luck with your version whole thing failed to load

`Logger: homeassistant.setup
Source: setup.py:251
First occurred: 9:10:26 PM (1 occurrences)
Last logged: 9:10:26 PM

Setup failed for custom integration 'echonetlite': Unable to import component: cannot import name 'EchonetMaxOpcError' from 'pychonet.echonetapiclient' (/usr/local/lib/python3.11/site-packages/pychonet/echonetapiclient.py)
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/setup.py", line 251, in _async_setup_component
component = integration.get_component()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/loader.py", line 822, in get_component
ComponentProtocol, importlib.import_module(self.pkg_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 1204, in _gcd_import
File "", line 1176, in _find_and_load
File "", line 1147, in _find_and_load_unlocked
File "", line 690, in _load_unlocked
File "", line 940, in exec_module
File "", line 241, in _call_with_frames_removed
File "/config/custom_components/echonetlite/init.py", line 5, in
from pychonet.echonetapiclient import EchonetMaxOpcError
ImportError: cannot import name 'EchonetMaxOpcError' from 'pychonet.echonetapiclient' (/usr/local/lib/python3.11/site-packages/pychonet/echonetapiclient.py)
`

It appears that the edge version of pychonet was not installed. I wrote it in manifest.json, but it doesn't seem to be working. I will look into it, so please return to a working version, although it will take some time.

I done, fixed it. Please redownload edge version and try it. Thanks!

@nao-pon seems to be working! But still having issues readding a split system that I deleted previously. Fails to add despite putting in the IP address. Any ideas? Error is Failed to connect.

After enabling debugging, stop and start the unconnected device in order to send Echonet-lite packets. After about 3 minutes, stop debugging and check the logs. If no packets from the IP in question are logged, there may be some kind of network failure.

In that case, please try restarting your router, access point, LAN hub, and air conditioner device.

Looks like there's references to Echonet lite and my device 192.168.0.13 being logged but still couldn't add the device back in via Auto Detect or manual IP address entry. I've attached the debug log if that helps
error_log-17.txt

Thank you for submitting your log! The logs show the following:

  • Multicast (public announcement) data is being received from 192.168.0.13
  • Not responding to inquiries from HA echonet-lite

The reason for not responding to inquiries may be that packets from HA have not arrived, or the device on 192.168.0.13 may be jammed.

Can you try restarting each network device such as router, access point, LAN hub, and the device on 192.168.0.13?

Thanks got it added back in again! I tried rebooting my Unifi router and access point as well as restarting Home Assistant. Could add it first attempt using the IP address. I had previously tried triggering a Wifi reconnect via the router console but didn't fix the problem. Glad it's working again. Was funny because the native Mitsubishi app was still able to properly control the AC but HA wasn't connecting it.

@julianjwong Congratulations! I am also happy!

By the way, the reason it with the Mitsubishi APP is probably uses TCP communication such as MQTT via the cloud, whereas ECHONET-lite uses UDP communication in the local network, and I guess the difference in communication path affects in the results.

Fixed via #158