Resubscribe real-time and cumulative usage after connection outage

Question

Resubscribe real-time and cumulative usage after connection outage

JacobWasFramed opened this issue 3 years ago · 4 comments

Currently running the beta with real-time energy usage. I notice that after an internet connection time out, the real-time usage doesn’t resubscribe when connection is restored. The cumulative usage only seems to fail if the outage happens during a reading due to an internet or Duke API outage longer than the 15 minute interval. Seems like any outage requires a restart of HA to restore. I’m unsure if this can be addressed here or should be address in the Duke Python repository.

I think a good solution would be to poll every 1-5 minutes after a connection is out until a successful response is received and at that point resubscribe or reinitialize the integration.

Potentially related, being unable to restart the integration without having to restart HA.

Answer 1 · 2021-11-18T00:37:08.000Z

Thanks for bringing this up. When this happened, did you see the integration show a failure in the Integrations page? (it would show with a red outline and an error message).

Answer 2 · 2021-11-23T15:21:15.000Z

Honestly can’t recall. I “introduced” a network failure just now and the plug-in did not crash. The cumulative usage was able to resume, but real-time MQTT did not resubscribe.

The errors that I have seen are different than the self-introduced network failure.

The original errors I saw when posting this were as follows:

Error: (update_coordinator.py)
Error fetching homeassistant data: Error
communicating with Duke Energy Usage API: Request
failed with unexpected error [https://cust-api.duke-
energy.com/gep/v2/auth/oauth2/token]: 400,
message='Bad Request', ur|=URL('https: //cust-
api.duke-energy.com/gep/v2/auth/oauth2/token')

Warning: (realtime.py)
Error requesting smartmeter auth, will retry after 5
seconds.

Warning: (realtime.py)
Unexpected message:
<paho.mqtt.client. MQTTMessage object at
0x7fe757d69c80>
Unexpected message:
<paho.mqtt.client. MQTTMessage object at
0x7fe74f6274a0>

What I’ve just experienced from a self-introduced network failure is as follows:

Error: (update_coordinator.py)
Error fetching homeassistant data: Error communicating with Duke Energy Usage API: Request failed with unexpected error [https://app-core1.de-iot.io/rest/cloud/smartmeter/usageByHour]: 504, message='Gateway Time-out', url=URL('https://app-core1.de-iot.io/rest/cloud/smartmeter/usageByHour?startHourDt=2021-11-20T05:00&endHourDt=2021-11-21T05:00')
Error fetching homeassistant data: Error communicating with Duke Energy Usage API: Request failed with unexpected error [https://app-core1.de-iot.io/rest/cloud/smartmeter/usageByHour]: Cannot connect to host app-core1.de-iot.io:443 ssl:default [Network unreachable]

Error: (realtime.py)
MQTT disconnect error, result code: Out of memory. (This may not be accurate)

Both ways though, real-time MQTT doesn’t resubscribe, whether api or internet outage and I have had it where the cumulative was not updating either before restarting HA. I know Duke had an 1-2week outage at the end of October/beginning of November, but this was after that time. Thanks for looking into this; I appreciate your time. If I see an outage again, I’ll make sure to update here with logs.

Answer 3 · 2021-11-23T15:31:30.000Z

I had a network outage over the weekend so I did experience this myself. The MQTT stuff does have reconnect logic, but we have to make a separate API request to Duke to initiate that MQTT streams, and I think that API request is the one causing the issues as it does not re-attempt after some period of time. I'll look into it when I have some time.

Answer 4 · 2021-12-18T14:23:24.000Z

I believe this should be fixed by the updated pyduke-energy version in v0.1.0b9 (#53). in the event of a connection loss, the error will be caught and we will attempt to reconnect indefinitely with an exponential back-off strategy (starts at 1 minute, maxes out at 60 minutes). the times can be adjusted if wanted later.

a warning log will be logged with the back-off time when this happens.

I tested this by cutting network to the docker container and it worked for that case. however, depending on where and when the internet drops, a different exception may be thrown. I believe I've caught all of them but there is a chance I missed some. any that I missed will be logged as an error log and we will not retry. please open a issue on that instance.