Devices unavailable after HA restart

Question

Devices unavailable after HA restart

Closed this issue a year ago · 10 comments

Steps to reproduce:

Install component
Restart HA
Configure all discovered Hue Sync devices (in this case, two)
Restart HA

All Hue Sync devices are unavailable after restarting and HA logs that this component isn't generating unique IDs:

2020-11-01 18:08:54 ERROR (MainThread) [homeassistant.components.media_player] Platform huesyncbox does not generate unique IDs. ID DEADBEEF0000 already exists - ignoring media_player.hue_sync_1
2020-11-01 18:08:54 ERROR (MainThread) [homeassistant.components.media_player] Platform huesyncbox does not generate unique IDs. ID DEADBEEF0001 already exists - ignoring media_player.hue_sync_2
2020-11-01 18:16:07 WARNING (MainThread) [homeassistant.helpers.service] Unable to find referenced entities media_player.hue_sync_2
2020-11-01 18:16:56 WARNING (MainThread) [homeassistant.helpers.service] Unable to find referenced entities media_player.hue_sync_1

Answer 1 · 2020-11-03T17:03:53.000Z

Hmm, that is strange. Not sure what is going on there.

Just to make sure
I guess the DEADBEEF000x entries are replaced syncbox ids or is that actually in the logs?
What version of HA are you running?

Answer 2 · 2020-11-03T17:35:42.000Z

You are correct – they are replaced IDs.

I'm seeing the problem on HA 0.117.2. I haven't used your component on any other version.

And, for clarification, if I delete the integration, restart HA (to force discovery), and reconfigure both devices, they will work until I restart HA again.

Answer 3 · 2020-11-06T16:59:16.000Z

I had a look into this and I am having no idea yet what could be causing this.
I have been unable to reproduce this, so I guess it might be related to having 2 boxes (I have only 1).

I added some additional logging that might give me some more insights in v1.8.

Could you enable debuglogging for the integration and reproduce the issue? You can enable debug logging by including the following section in configuration.yaml (and remove it afterwards)

logger:
  default: info
  logs:
    custom_components.huesyncbox: debug

Note that the logging will contain the ids and tokens for the box so please redact those or if you trust me enough send them directly to

Answer 4 · 2020-11-06T20:41:32.000Z

Having previously upgraded HA to 0.117.4, and now 1.8 of this component, I can no longer reproduce the problem. I've restarted several times and both devices continue to work.

HA did still complain about the component not generating unique IDs at least the first time I restarted with the new component. At least currently, .storage/core.entity_registry has a unique_id for both entities that matches the value replaced by "DEADBEEF000x" above (I hadn't checked this before).

I'll do some more testing this weekend to see if I can get it to break and follow up afterwards.

Answer 5 · 2020-11-08T15:55:56.000Z

Ok, thanks for the update. On one hand nice that it seems solved for you, but on the other hand it feels like there is still some weird issues somewhere.

Answer 6 · 2020-11-08T18:31:45.000Z

I was able to reproduce the problem on 1.7 and 1.9. (I haven't tried 1.8 again.)

I downgraded to 1.7, removed the integrations, restarted, reconfigured devices, restarted, and the problem returned. I then upgraded to 1.9, remove the integrations, restarted, reconfigured devices, restarted, and once again had the problem.

A third attempt to remove/restart/reconfigure/restart didn't reproduce the problem, suggesting it is intermittent - maybe related to timing. The two times it failed I configured the devices immediately upon getting the notification they were discovered. When it didn't fail, I waited for a minute or two until I configured the devices.

(I haven't repeated each case to determine more definitively whether timing is related. I have a huge z-wave network that takes about 10-15 minutes to start up and, if I don't wait until it does, HA trashes the z-wave config, meaning each iteration takes 20-30 minutes.)

Every time I've checked, .storage/core.entity_registry had a unique_id for both entities.

Logs will be on their way by email shortly. No obvious differences, beyond order of events.

Answer 7 · 2020-11-11T10:39:31.000Z

Thanks for collecting the logging.

It seems that each entry is setup multiple times (each box twice). This causes multiple media_player entities to be generated with the same unique_id which causes those warnings.

I don't know why it is setting up the entries twice. I don't see this happening on my setups, there setup only happens once per device, even when leaving HA running for a long time. I am also not sure if this is supposed to be happening. From the developer docs and code from other official HA integrations I get the impression that the async_setup_entry is only supposed to be called once per entry per HA session (there seems to be no protection against multiple calls).

So now I know what is causing the issue, I just don't have a solution yet.
I first need to get a better understanding of why it is happening and then figure out what would be the proper way to solve it.

Answer 8 · 2020-11-28T15:35:51.000Z

Over the past few weeks I've restarted HA multiple times for config changes and not once experienced this problem. This leads me to believe it's likely some sort of race condition related to the initial discovery/configuration of the device relative to HA startup events. Maybe the second setup is occurring before HA saves the newly configured device, causing it to corrupt some data before it gets written to disk (or prevent it entirely). Or perhaps I've discovered some timing bug in HA itself that causes devices configured too early in HA init to be stored properly. (But I've never seen this with anything else.) Either way, it appears that the devices surviving the first HA restart indicates that the problem will never occur.

Answer 9 · 2020-12-03T12:28:59.000Z

Thanks for the update. Good to hear that it keeps working once it works.
It is still on my radar to have a closer look at when I have some more time.

Answer 10 · 2023-08-30T18:39:32.000Z

I have rebuild the integration from scratch with v2.0.0 and cleaning up some old issues.
Please open a new issue if this is still relevant on v2.0.0