custom-components/weatheralerts

Timeouts and not pulling data

ttaidapos opened this issue · 8 comments

Version of the custom_component
0.1.1 with scan interval set to 120

Version of Hass
109.6

Describe the bug
Inconsistent alerts or not pulling data

Log

2020-05-14 15:48:05 WARNING (MainThread) [homeassistant.loader] You are using a custom integration for weatheralerts which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant.
2020-05-14 15:48:07 CRITICAL (MainThread) [custom_components.weatheralerts.sensor] [ILZ103,ILC031] weatheralert download failure - HTTP status code 502
2020-05-14 15:48:26 ERROR (MainThread) [homeassistant.helpers.entity] Update for sensor.weather_alerts fails
2020-05-14 15:50:18 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.northern_cook is taking over 10 seconds
2020-05-14 15:50:18 ERROR (MainThread) [custom_components.weatheralerts.sensor] [ILZ103,ILC031] Could not update the sensor (TimeoutError)
2020-05-14 15:52:19 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.northern_cook is taking over 10 seconds
2020-05-14 15:54:20 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.northern_cook is taking over 10 seconds
2020-05-14 15:56:21 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.northern_cook is taking over 10 seconds
2020-05-14 15:58:18 CRITICAL (MainThread) [custom_components.weatheralerts.sensor] [ILZ103,ILC031] weatheralert download failure - HTTP status code 502
2020-05-14 16:00:23 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.northern_cook is taking over 10 seconds
2020-05-14 16:00:23 ERROR (MainThread) [custom_components.weatheralerts.sensor] [ILZ103,ILC031] Could not update the sensor (TimeoutError)
2020-05-14 16:02:14 CRITICAL (MainThread) [custom_components.weatheralerts.sensor] [ILZ103,ILC031] weatheralert download failure - HTTP status code 502
2020-05-14 16:03:20 ERROR (MainThread) [homeassistant.core] Error doing job: Task exception was never retrieved
2020-05-14 16:04:18 CRITICAL (MainThread) [custom_components.weatheralerts.sensor] [ILZ103,ILC031] weatheralert download failure - HTTP status code 502
2020-05-14 16:06:26 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.northern_cook is taking over 10 seconds
2020-05-14 16:06:26 ERROR (MainThread) [custom_components.weatheralerts.sensor] [ILZ103,ILC031] Could not update the sensor (TimeoutError)
2020-05-14 16:06:47 WARNING (MainThread) [homeassistant.helpers.entity] Update of weather.forecast is taking over 10 seconds
2020-05-14 16:08:27 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.northern_cook is taking over 10 seconds
2020-05-14 16:10:28 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.northern_cook is taking over 10 seconds
2020-05-14 16:12:29 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.northern_cook is taking over 10 seconds
2020-05-14 16:14:30 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.northern_cook is taking over 10 seconds
2020-05-14 16:16:31 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.northern_cook is taking over 10 seconds
2020-05-14 16:18:32 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.northern_cook is taking over 10 seconds
2020-05-14 16:20:23 CRITICAL (MainThread) [custom_components.weatheralerts.sensor] [ILZ103,ILC031] weatheralert download failure - HTTP status code 502
2020-05-14 16:22:34 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.northern_cook is taking over 10 seconds
2020-05-14 16:22:34 ERROR (MainThread) [custom_components.weatheralerts.sensor] [ILZ103,ILC031] Could not update the sensor (TimeoutError)
2020-05-14 16:24:35 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.northern_cook is taking over 10 seconds
2020-05-14 16:26:34 CRITICAL (MainThread) [custom_components.weatheralerts.sensor] [ILZ103,ILC031] weatheralert download failure - HTTP status code 502
2020-05-14 16:28:30 CRITICAL (MainThread) [custom_components.weatheralerts.sensor] [ILZ103,ILC031] weatheralert download failure - HTTP status code 502
2020-05-14 16:29:27 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection.2418367824] Client unable to keep up with pending messages. Stayed over 512 for 5 seconds
2020-05-14 16:30:38 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.northern_cook is taking over 10 seconds
2020-05-14 16:30:38 ERROR (MainThread) [custom_components.weatheralerts.sensor] [ILZ103,ILC031] Could not update the sensor (TimeoutError)
2020-05-14 16:32:39 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.northern_cook is taking over 10 seconds
2020-05-14 16:34:36 CRITICAL (MainThread) [custom_components.weatheralerts.sensor] [ILZ103,ILC031] weatheralert download failure - HTTP status code 502
2020-05-14 16:36:41 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.northern_cook is taking over 10 seconds
2020-05-14 16:36:41 ERROR (MainThread) [custom_components.weatheralerts.sensor] [ILZ103,ILC031] Could not update the sensor (TimeoutError)
2020-05-14 16:38:32 CRITICAL (MainThread) [custom_components.weatheralerts.sensor] [ILZ103,ILC031] weatheralert download failure - HTTP status code 502
2020-05-14 16:40:30 ERROR (MainThread) [homeassistant.core] Error doing job: Fatal read error on socket transport
Traceback (most recent call last):
  File "/usr/lib/python3.7/asyncio/selector_events.py", line 801, in _read_ready__data_received
    data = self._sock.recv(self.max_size)
TimeoutError: [Errno 110] Connection timed out
2020-05-14 16:40:43 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.northern_cook is taking over 10 seconds
2020-05-14 16:40:43 ERROR (MainThread) [custom_components.weatheralerts.sensor] [ILZ103,ILC031] Could not update the sensor (TimeoutError)
2020-05-14 16:42:34 CRITICAL (MainThread) [custom_components.weatheralerts.sensor] [ILZ103,ILC031] weatheralert download failure - HTTP status code 500
2020-05-14 16:44:45 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.northern_cook is taking over 10 seconds
2020-05-14 16:44:45 ERROR (MainThread) [custom_components.weatheralerts.sensor] [ILZ103,ILC031] Could not update the sensor (TimeoutError)
2020-05-14 16:46:46 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.northern_cook is taking over 10 seconds
2020-05-14 16:48:47 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.northern_cook is taking over 10 seconds
2020-05-14 16:50:48 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.northern_cook is taking over 10 seconds

Supporting information
image

I am also getting these errors in the logs, version 0.1.1 default scan interval on Home Assistant 109.6,

I am also getting these errors in the logs, version 0.1.1 default scan interval on Home Assistant 109.6,

Yup my main issue is I have notifications going out whenever the alert number changes, with the number going back to zero or not existent, I end up getting slammed with the same weather alert. Yesterday I got over 25 for 8 distinct alerts, blah. I looked through the packages to optimize but no code there accounts for this. Otherwise I can care less if my logs flud.

Thanks for the reports and info. I believe there are four different issues at play here so I'll address them separately.

  1. The timeout errors that are logged, such as 'Could not update the sensor (TimeoutError)', are confirming there is a problem with the weather.gov alerts feed server being slow to respond. There is a very slight chance the timeout errors could be a problem with your network or Internet connection, but it's almost always going to be a problem with the weather.gov alerts feed server not responding fast enough to the download request. To potentially address this issue, I will try to make the download timeout (currently hardcoded at 10 seconds) dynamic and based on the scan_interval. A longer scan_interval would then result in extra time to see if the alerts server eventually responds, but not such a long timeout that a previous request might overlap the next weatheralerts run. If a request times out, the previous successfully downloaded alerts are retained and will be updated during the next successful download.

  2. The download errors that are logged, such as 'weatheralert download failure - HTTP status code 502', are also confirming there is a problem with the weather.gov alerts feed server. Other than doing what weatheralerts is already doing, trying again after the scan_interval period, there is no workaround for this issue since these are weather.gov alerts feed server errors and are completely out of our control. As with the timeout errors, there is a very slight chance some download errors could be a problem with your network or Internet connection, but that doesn't appear to be the case here. The HTTP status code errors posted above indicate the weather.gov alerts feed server was returning Internal Server Error (status code 500) and Bad Gateway (status code 502) errors. As with the timeout errors, the previous successfully downloaded alerts are retained and will be updated during the next successful download.

  3. The multiple notifications issue appears to have multiple causes. I addressed one of the causes by sorting the alerts before storing them in the weatheralerts sensor. Since doing that I have also experienced one other bout of excessive notifications (over 20 notifications within 45 minute timespan) but didn't have enough logging to determine an exact cause. I will be adding some additional debug logging to capture some of the alerts properties so I can see what is happening to the alerts feed. I suspect the issue is either still related to the sorting of the alerts or the feed is returning inconsistent alerts (disappearing and reappearing alerts) that result in retriggering of notifications.

  4. Regarding the alert number changes, this also relates to issues 1 and 2 above. I had made a change to weatheralerts to bring it inline with how other integrations/components/platforms behave, by changing the main sensor state to unavailable if the alerts feed download fails due to timeout or server issues. If you are using the weatheralerts_*.yaml package files I posted (https://github.com/custom-components/weatheralerts/tree/master/packages), there is a template sensor in there that will contain the number of active alerts. If you aren't using the yaml package or if you don't want to use it, you can check out the first template sensor in weatheralerts_1.yaml package to see how I'm getting and retaining an alert count even when the main weatheralerts sensor changes to unavailable, if you want to implement your own alert count template sensor. Right now it doesn't take into account expired alerts that might still be retained if an alert expires while weatheralerts is unable to download new alerts from the alerts feed. I'm working on that issue too.

The problems with the weather.gov alerts feed server is really making things more difficult than it should be to have reliable alerts and notifications in Home Assistant. It works most of the time, but 'most of the time' isn't really good enough when it comes to severe weather alerts. I am considering making a significant change to weatheralerts that would have the weatheralerts platform download and cache the alerts feed locally. Doing this would allow weatheralerts to do some processing and buffering of the alerts feed data locally to ensure more consistency before populating the weatheralerts sensor with the data.

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

I will be pushing a new release in the next few days that simplifies the internal alerts sorting and adds some additional debug level logging to try to find the root cause of repeating notifications.

In the past two months I have not experienced any quickly repeating or rapid-fire notifications, so I don't yet have any solid clue to what the exact cause is. My current theory is the JSON alert feed is dropping an alert on one request, adding it back to the feed on a subsequent request, dropping the alert again, and repeating this behavior. It may also be an internal sorting issue with the alerts object. I simplified the alerts sorting, so that may have solved the issue if that was the cause.

I added some debug level logging to capture pre-sort and post-sort alert IDs and I will leave this logging in the code. After installing the next new version of weatheralerts, anybody with problems of constant repeated notifications for the same alerts should enable debug level logging for the weatheralerts component and post the log lines (from the home-assistant.log file) that contain the text weatheralerts from around the time of the observed failure.

I have also changed some of the current logging level of some of the 'errors' from error to info to better reflect the nature of the log messages. I will also add a Troubleshooting section to the documentation to help clarify what these log messages are and what they mean. Timeouts and download failures due to various HTTP status codes are not really errors with the weatheralerts component, they are mostly an indication that the weather.gov JSON alerts feed is being slow to respond, down for maintenance, or experiencing other technical problems that are out of our control. These 'errors' are generally very short-lived and resolve on their own. However, the weather.gov JSON alerts feed has been known to have lengthy outages of a few hours or more, from time to time.

So, yesterday I was finally able to capture an instance of repeated notification triggers where notifications were triggered on three consecutive weatheralerts download intervals. The cause, in this case, was due to a weather alert being added to the feed and then removed and then added again. The root of the problem is on the weather.gov side so I'm not sure I can fully fix this issue without causing vital notifications from being missed. I will investigate this particular cause for another day or two before pushing out another release.

I think I have fixed the repeated notifications issue caused by alerts that intermittently drop out of the alerts feed. The fix involves storing the alert IDs in an input_text entity when a notification automation is triggered and adding an automation condition that checks the first alert ID against the ID list stored in the input_text entity. I can see one potential cause for missed notifications though...if an alert is added to the weather.gov alert feed out of order. I'll work on that potential issue after I push out the next release. I just have to commit the YAML packages and finish the documentation update and it will be pushed out.

As I mentioned previously, the TimeoutError and download failure log messages are a result of weather.gov API outages and not true errors with the weatheralerts integration. I am looking into adding a backup source for alerts when the weather.gov JSON alerts API feed is experiencing an outage. I changed some log levels and messages to more clearly indicate why there was a problem. I also added a Troubleshooting section to the documentation with information regarding these common log messages.

The re-triggering issue should now be resolved (or at least greatly improved) if you are using the YAML packages from the packages directory of the weatheralerts repository. HACS doesn't install or update the YAML packages, so if you are using the YAML packages be sure to manually download and install them after updating the weatheralerts integration (and then restart Home Assistant). If you are using your own custom automations, check the YAML packages in this repository to see how I'm using input_text: entities with the notification automations to track the alerts that result in a notification trigger. If you are using your own custom automations and can't get it working without erroneous re-triggering, open a new Issue with the weatheralerts repository and include your complete automation settings/configuration and details of any relevant template sensors and entities you are using with the automation and I'll see if I can get it working.