Random number of Mitsubishi aircons unable to connect with Failed setup will retry error

Question

Random number of Mitsubishi aircons unable to connect with Failed setup will retry error

Closed this issue 8 months ago · 34 comments

I have 8 Mitsubishi split system aircons which used to all work perfectly with Home Assistant. Recently I've had several (normally 2 or 3) at a time failing to complete connection and Home Assistant reporting the Failed setup, will retry error message. These devices are all connected with good signal strength and able to be controlled via the Melview app (Australian portal). Rebooting Home Assistant sometimes fixes the issue or at least allows a different random set to connect instead whilst some are unable to connect usually 2 or 3 of the 8.
I've tried using my Unifi router to reconnect those to wifi without any results. This doesn't seem to be the issue anyway given they all remain fully functional via the official Melview app/webapp.

I've struggled to work out how to get any meaningful log info so pointers appreciated.

Answer 1 · 2024-01-07T02:34:45.000Z

Assign static IP addresses in DHCP using the MAC address of each adapter.

Answer 2 · 2024-01-07T02:48:33.000Z

Forgot to mention I'm already using static IP via my Dhcp server running in Home Assistant.

Answer 3 · 2024-01-07T03:10:35.000Z

Roll back to version 3.7.6 and let me know if it improves things.

Answer 4 · 2024-01-07T03:47:53.000Z

Yep looks like the rollback worked and all ACs loaded first time around. Ill continue to monitor and post back here if the problem persists. Thanks a bunch for your work and help!

Answer 5 · 2024-01-07T08:13:13.000Z

@julianjwong thank you, lets keep an eye on it for a week and let me know.

@nao-pon i think something is not quite right with the recent update to support IP address changes. I think some devices of the same type are conflicting with UIDs on the Mitsubishi's being identical. Testing based upon identifying the UIDs may not be the best solution, if vendors are going to not provide unique values, which is what i suspect is happening here...

Answer 6 · 2024-01-07T08:40:21.000Z

It is possible that this is due to the change from previously identifying devices by IP address to UID. I will re-check the UID specifications and look for ways to correctly identify the device. I'm currently on vacation, so I'll start working on it when I get home.

Answer 7 · 2024-01-07T08:49:16.000Z

I just deciphered my own UID.
00 00 06 01 04 a0 c9 a0 ff fe 06 97 19 01 30 01

Manufacturer    Unknown      EUI-64 address (MAC+padding)     EOJGC+EOJCC+instance?
000006            0104           A0:C9:A0:ff:FE:06:97:19                01 30 01

These should be unique especially if they have embedded EUI-64? This is only relevant to my own mitsubishi adapter however. 🤔

Answer 8 · 2024-01-07T09:02:39.000Z

Yes, If I remember correctly, if the same uidi is accessed with a different IP address, it will be treated as a change of IP address, so if multiple devices have the same uidi, only the last one will be configured.

Answer 9 · 2024-01-07T09:10:58.000Z

I am thinking about this problem incorrectly. In this situation @julianjwong is using static IP addresses. The IP addresses will not be changing. Something else must be causing this issue, not UIDs.

I had a look at the changelog for 3.7.7 that may be a potential root cause for connectivity issues and the only other code that may be relevant could be this update to __init__.py. (Commits 8ee0028 and 56bf984).

        echonetlite = ECHONETConnector(instance, hass.data[DOMAIN]["api"], entry)
        try:
            await asyncio.wait_for(
                echonetlite.async_update(), timeout=60
            )  # 20 secs * retry 3 times = 60
            hass.data[DOMAIN][entry.entry_id].append(
                {"instance": instance, "echonetlite": echonetlite}
            )
        except (asyncio.TimeoutError, asyncio.CancelledError) as ex:
            raise ConfigEntryNotReady(
                f"Connection error while connecting to {host}: {ex}"
            ) from ex
        except KeyError as ex:
            raise ConfigEntryNotReady(
                f"IP address change was detected during setup of {host}"
            ) from ex

What make this relevant is that the ECHONETConnector is throttled by design. Which means that all echonetlite devices are obligated to honor the throttle. The more devices - the longer they all have to way. I know it is set fairly low at 1 second between updates, but with 8 devices at play and having to wait for response messages to polling, we may in fact be hitting the 60 second timeout configured above by the time we get to configuring those last 2-3 devices, which appears to be the fault condition we are seeing here.

    @Throttle(MIN_TIME_BETWEEN_UPDATES)
    async def async_update(self, **kwargs):
        return await self.async_update_data(kwargs=kwargs)

Answer 10 · 2024-01-07T09:52:02.000Z

Let me know if I can help by testing or providing further log data, etc.

I have 5 splits the same model, 2 of another and 1 of another. So 3 different models across my 8 units

Answer 11 · 2024-01-09T05:00:43.000Z

I rewrote the request timeout error handling. If timeouts are the cause of this issue, this change may fix the issue.

@julianjwong Since I changed the master branch, could you try re-downloading master using HACS and see if the problem is resolved?

If the problem persists, enable debugging on the ECHONETLite integration page, restart HA, and once HA has finished starting, disable debugging and download the log file. Please attach that log file.

Answer 12 · 2024-01-09T07:55:10.000Z

Looks like one of my split systems failed to load after downloading from the master branch.
Attached the log file.
error_log-15.txt

Answer 13 · 2024-01-09T10:18:08.000Z

Ok so I rolled back to 3.7.6 and it appeared to load all my split systems again. But the one that failed on the master wasn't looking right and couldn't be controlled. I deleted it thinking it was corrupted and I could add it back in. Unfortunately I can't seem to add it. Auto detect and adding directly via the IP address fails. Also tried rolling back to 3.7.5 without luck.

Answer 14 · 2024-01-10T00:38:01.000Z

@julianjwong Thank you for your cooperation in testing.

When I looked at the logs, I found that the device entry was automatically deleted due to a timeout. This problem seems to require modification of the pychonet library. I'll try it, but it will take some time.

Answer 15 · 2024-01-10T01:25:55.000Z

@scottyphillips I think pychonet's update() handles the timeout, but is it better to raise TimeoutError or return None?

Answer 16 · 2024-01-10T02:37:43.000Z

@scottyphillips I'm thinking of removing @Throttle(MIN_TIME_BETWEEN_UPDATES) in async_update() in init.py and limiting the request interval to 1 second for each IP in echonetMessage() of echonetapiclient.py. What do you think?

    async def echonetMessage(self, host, deojgc, deojcc, deojci, esv, opc):
        no_res = True if esv is SETI else False
        payload = None
        message_array = {
            "DEOJGC": deojgc,
            "DEOJCC": deojcc,
            "DEOJCI": deojci,
            "ESV": esv,
            "OPC": opc,
        }
        if self._state.get(host) is None:
            self._state[host] = {"instances": {}}

        # Consecutive requests to the device must wait for a response
        if self._waiting.get(host) is None:
            self._waiting[host] = 0
        if self._waiting[host] > 0:
            for x in range(0, self._message_timeout):
                # Wait up to 20(0.1*200) seconds depending on the Echonet specifications.
                await asyncio.sleep(0.1)
                if not self._waiting[host]:
                    # Wait 1 sec for Echonet specifications.
                    await asyncio.sleep(0.9)
                    break
            if self._waiting[host]:
                return False

Answer 17 · 2024-01-10T08:26:57.000Z

@nao-pon I think that what you propose makes sense and is reasonable, its worth a try?

Answer 18 · 2024-01-10T12:43:33.000Z

I tested by removing @Throttle(MIN_TIME_BETWEEN_UPDATES), but that was a bad idea. Since update requests are made in proportion to the number of multiple instances and multiple entities, congestion can quickly occur.

I stopped doing that, but I will create a PR by adding timeout processing and available status.

Answer 19 · 2024-01-10T13:06:53.000Z

I created PR...

Thanks!

Answer 20 · 2024-01-11T11:26:24.000Z

If you are willing to try it out, you can try out the modified features in my echonetlite edge branch.

https://github.com/nao-pon/echonetlite_homeassistant

Answer 21 · 2024-01-12T03:53:12.000Z

@nao-pon happy to test but need help on how to download your code and install over the top of my Home Assistant instance given its a different repo.

Answer 22 · 2024-01-12T07:41:31.000Z

First, please save a full backup on hand so that you can restore HA in case a problem occurs.

It's easy with HACS. Register my repository as an integration in a custom repository in HACS.

Next, install master of ECHONETLite Platform (nao-pon@edge) and restart HA.

Answer 23 · 2024-01-12T07:58:39.000Z

So it's ok if I install over the top of Scotty's one? Or do I need to uninstall his first?

Answer 24 · 2024-01-12T09:52:50.000Z

There is no need to uninstall the integration. Just overwrite it. The settings will continue as they are.

Answer 25 · 2024-01-12T10:20:26.000Z

@nao-pon sorry no luck with your version whole thing failed to load

`Logger: homeassistant.setup
Source: setup.py:251
First occurred: 9:10:26 PM (1 occurrences)
Last logged: 9:10:26 PM

Setup failed for custom integration 'echonetlite': Unable to import component: cannot import name 'EchonetMaxOpcError' from 'pychonet.echonetapiclient' (/usr/local/lib/python3.11/site-packages/pychonet/echonetapiclient.py)
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/setup.py", line 251, in _async_setup_component
component = integration.get_component()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/loader.py", line 822, in get_component
ComponentProtocol, importlib.import_module(self.pkg_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 1204, in _gcd_import
File "", line 1176, in _find_and_load
File "", line 1147, in _find_and_load_unlocked
File "", line 690, in _load_unlocked
File "", line 940, in exec_module
File "", line 241, in _call_with_frames_removed
File "/config/custom_components/echonetlite/init.py", line 5, in
from pychonet.echonetapiclient import EchonetMaxOpcError
ImportError: cannot import name 'EchonetMaxOpcError' from 'pychonet.echonetapiclient' (/usr/local/lib/python3.11/site-packages/pychonet/echonetapiclient.py)
`

Answer 26 · 2024-01-12T10:46:03.000Z

It appears that the edge version of pychonet was not installed. I wrote it in manifest.json, but it doesn't seem to be working. I will look into it, so please return to a working version, although it will take some time.

Answer 27 · 2024-01-12T13:17:32.000Z

I done, fixed it. Please redownload edge version and try it. Thanks!

Answer 28 · 2024-01-12T14:03:35.000Z

@nao-pon seems to be working! But still having issues readding a split system that I deleted previously. Fails to add despite putting in the IP address. Any ideas? Error is Failed to connect.

Answer 29 · 2024-01-12T14:53:55.000Z

After enabling debugging, stop and start the unconnected device in order to send Echonet-lite packets. After about 3 minutes, stop debugging and check the logs. If no packets from the IP in question are logged, there may be some kind of network failure.

In that case, please try restarting your router, access point, LAN hub, and air conditioner device.

Answer 30 · 2024-01-12T21:31:56.000Z

Looks like there's references to Echonet lite and my device 192.168.0.13 being logged but still couldn't add the device back in via Auto Detect or manual IP address entry. I've attached the debug log if that helps
error_log-17.txt

Answer 31 · 2024-01-13T03:29:36.000Z

Thank you for submitting your log! The logs show the following:

Multicast (public announcement) data is being received from 192.168.0.13
Not responding to inquiries from HA echonet-lite

The reason for not responding to inquiries may be that packets from HA have not arrived, or the device on 192.168.0.13 may be jammed.

Can you try restarting each network device such as router, access point, LAN hub, and the device on 192.168.0.13?

Answer 32 · 2024-01-13T10:25:25.000Z

Thanks got it added back in again! I tried rebooting my Unifi router and access point as well as restarting Home Assistant. Could add it first attempt using the IP address. I had previously tried triggering a Wifi reconnect via the router console but didn't fix the problem. Glad it's working again. Was funny because the native Mitsubishi app was still able to properly control the AC but HA wasn't connecting it.

Answer 33 · 2024-01-13T12:15:17.000Z

@julianjwong Congratulations! I am also happy!

By the way, the reason it with the Mitsubishi APP is probably uses TCP communication such as MQTT via the cloud, whereas ECHONET-lite uses UDP communication in the local network, and I guess the difference in communication path affects in the results.

Answer 34 · 2024-01-15T12:39:13.000Z

Fixed via #158