jazzyisj/unavailable-entities-sensor

Improvement: expand or split project to an "Unavailable / Unknown *DEVICE* Monitoring - Template Sensor"

bcutter opened this issue · 2 comments

Context (Reason/initial issue):
some entities of devices switch to “unavailable” only after a very very veeeeeeery long time (very likely a default set by that integration/firmware - in this case a ZigBee coordinator) of 24 hours. Quite inacceptable right - it takes me 24 hours until the unavailable entity monitoring notifies me about a problem which started a whole day ago. I tend to call it “flat line issue”:
grafik

That’s clearly the fault of the device integration (which will likely NOT change its behaviour, see https://forum.phoscon.de/t/inacceptable-huge-default-sensor-unavailable-time-of-24-hours/1592), but we could improve the monitoring of this like this:

How about - either as a complete option or better as a separated - second sensor which works completely the same like the existing one - only the set of defined “problem states” is a bit different:

  • instead of checking for states like unavailable or unknown…
  • we check if entities did change/update their values (a real sensor value change meaning higher/lower value than before - monitoring the last update time might be not sufficient due to the fact it is being touched by a HA restart etc.) within a certain period of time.

Simple example: if my sensors XYZ did not update once within the last 3 hours, turn on the “name to be defined sensor” (I’d propose something like “entities to be checked” or sth. like that).

Needed:

  1. variable for acceptable time slot like 3 hours
  2. detection algorithm “did (not) update within [value] time”)
  3. maybe a mapping combination like “only alert if all entities belongig to one device did not provide any value update” (in the end this way we could monitor if a certain device might have a problem and we wouldn’t need to ‘monitor’ many entities. That could transform this to a “Unavailable / Unknown Device Monitoring - Template Sensor” - …which doesn’t exist at all currently, because devices can’t be directly monitored, right?!

That’s basically the same this project is doing today - but it is based on a some kind of “dynamic” problem detection and could be considered as additional or advanced approach. In the end it’s all about monitoring our systems components - entities and devices.

Source of this improvement/idea:
https://community.home-assistant.io/t/unavailable-unknown-entity-monitoring-template-sensor/147618/215

If I understand you correctly this template which would monitor one device (I used a Shelly 3EM here) would meet your needs. You can create as many of these template sensors as you want, just name them all uniquely.

I used a Shelly house monitor as an example because I don't have any zigbee stuff. Do zigbee devices actually constantly update their state? A lot of Z-Wave doesn't unless the state actually changes.

This template will return entities that have a last_updated attribute value greater than than now the ignore_seconds attribute specified. 1 hour in the examples.

If your sensors have something unique in common you can search against you can do something like this. In this case the device entities I want to monitor are all sensors so we can narrow it down to that domain and they all have "_monitor_channel" in the entity_id (and nothing else in my config does) so we can search on that.

{% set ignore_seconds = 3600 %}
{% set ignore_ts = (now().timestamp() - ignore_seconds)|as_datetime %}
{{ states.sensor
    |selectattr('entity_id','search','_monitor_channel')
    |rejectattr('last_updated','ge',ignore_ts)
    |map(attribute='entity_id')|list }}

If not you can use groups which is probably actually a more efficient and robust method as you're only iterating the entities in the group and not the whole states object. There is also no chance an undesired entity can sneak in there by mistake (eg an unrelated entity has '_monitor_channel' name.

{% if state_attr('group.device_monitor','entity_id') != none %}
  {% set ignore_seconds = 3600 %}
  {% set ignore_ts = (now().timestamp() - ignore_seconds)|as_datetime %}
  {{ expand('group.device_monitor')
      |rejectattr('last_updated','ge',ignore_ts)
      |map(attribute='entity_id')|list }}
{% endif %}

This is the device_monitor group definition.

group:
  device_monitor:
    entities:
      - sensor.house_energy_monitor_channel_a_current
      - sensor.house_energy_monitor_channel_a_energy
      - sensor.house_energy_monitor_channel_a_energy_returned
      - sensor.house_energy_monitor_channel_a_power
      - sensor.house_energy_monitor_channel_a_power_factor
      - sensor.house_energy_monitor_channel_a_voltage
      - sensor.house_energy_monitor_channel_b_current
      - sensor.house_energy_monitor_channel_b_energy
      - sensor.house_energy_monitor_channel_b_energy_returned
      - sensor.house_energy_monitor_channel_b_power
      - sensor.house_energy_monitor_channel_b_power_factor
      - sensor.house_energy_monitor_channel_b_voltage
      - sensor.house_energy_monitor_channel_c_current
      - sensor.house_energy_monitor_channel_c_energy
      - sensor.house_energy_monitor_channel_c_energy_returned
      - sensor.house_energy_monitor_channel_c_power
      - sensor.house_energy_monitor_channel_c_power_factor
      - sensor.house_energy_monitor_channel_c_voltage 

No response. Closed.