Data Smoothing is probably not working as intended

Question

Data Smoothing is probably not working as intended

Mathea90 opened this issue 5 months ago · 7 comments

Describe the bug

I am getting fast and erratic area changes, sometimes each second. To optimize this, I increased the "Smoothing Samples" value, which did not really change the behavior, which I found strange. I even tried values up to 60 and higher, which in my understanding should make the Area changes slow but more stable. This was not the case.

If I look at the "distance to bt_proxy" entities (not the unfiltered values), the distance sometimes jumps by a lot in each direction. So e.g. in one second the distance of a device to a scanner is like 3m while next second it is 9m and it jumps back to like 4. Looking at the logic of how the averaging is carried out, I can't understand how this behavior is possible. As I'm not a programmer, my interpretation may be flawed. But when I look at this code block, it should only be possible to spontaneously jump to a lower distance value and not to a higher value, as the logic of the iteration loop gives lower distance values "more weight":

dist_total: float = 0
dist_count: int = 0
local_min: float = self.rssi_distance_raw or DISTANCE_INFINITE
for distance in self.hist_distance_by_interval:
  if distance <= local_min:
    dist_total += distance
      local_min = distance
    else:
      dist_total += local_min
      dist_count += 1
    if dist_count > 0:
      movavg = dist_total / dist_count
    else:
      movavg = local_min

As soon as the new raw distance values keep filling the list of historic data with higher distances, the average should go up slowly, right?

Would it be possible to check the logic and make it more robust so that erratic area changes are throttled?

Nevertheless, thank you for this awesome integration!

Answer 1 · 2024-11-12T15:11:47.000Z

Hi, thanks for your report and digging into this!

TL/DR: It's working (mostly) as intended. You should use a helper template to apply smoothing to the Area sensor when you need to trade some responsiveness for extra stability, because the Area sensor (by design) snaps based on the smoothed "distance" values, every second.

I increased the "Smoothing Samples" value, which did not really change the behavior, which I found strange.

Yes, the smoothing buckets only smooth the distance measurement, and not the Area determination. That sounds a bit pedantic, but if two proxies alternately make sporadic, close readings, the smoothing won't prevent them from alternately winning the "Area". The area sensor's job is to answer the question "Which scanner do you think this device is closest to, right at this instant?".

I even tried values up to 60 and higher, which in my understanding should make the Area changes slow but more stable. This was not the case.

It should make the measurements more sluggish to rise, but won't directly affect how often the Area changes.

Taking screenshots of what you're observing via the history view is super-helpful, as it makes it easier to visualise and discuss what's going on. Enabling some of the "Unfiltered" sensors can make it a bit easier still to understand. Here's a recent 5minute slice from my watch tracking, just looking at data from the two proxies I have in my studio:

While these are both in the same "Area", we can look at the "Nearest Scanner" entity up top (mustard/peach) for what would have happened if the proxies were in different areas. We see the red line lose out to the blue at about 01:07:30, but it's really a "value judgement" at this stage - on a small time scale, you could argue that the teal line clearly shows that prox-studio should not be winning this contest, but indeed the smoothing caused it to win for a few seconds when camfront was very much closer and more consistent - but at a longer timeframe, from 01:07:00 to 01:08:40 you'd say that prox-studio was clearly the rightful winner. But expand the timeframe to the full 5 minutes, and "obviously" camfront is the only sensible choice. All these assertions are value judgements, based on what's important for the individual users' use-case, on an automation-by-automation basis. I want lights to trigger instantly, because the cost of not doing so is me breaking my leg, and the cost of nuisance triggers is $0.00035 per hour of wasted electricity. But I want the air conditioner to start up only when it's sure I'm really there - the cost of a later start is negligible, the cost of false triggers is excessive power usage plus potentially a burnt-out compressor.

And note I was in the "same" place for those whole 5 minutes, but rssi is just noisy, especially when measured from a tiny transmitter strapped to a large sack of salty water that moves a little when typing :-)

So the only sensible choice is that Bermuda give you the most timely, most responsive data it can, based on what it can "prove". And then you can choose to discard information (by smoothing, delaying etc) on a per-application basis.

As I'm not a programmer, my interpretation may be flawed. But when I look at this code block, it should only be possible to spontaneously jump to a lower distance value and not to a higher value, as the logic of the iteration loop gives lower distance values "more weight":

That's spot-on: Instantly accept closer readings, reluctantly move toward distant readings. The idea is that noisiness in the rssi signal almost always (99.99%?) makes a distance longer, almost never shorter. So that filter looks back in time and averages the most optimistic readings, in a way. It's a bit ugly and weird, but sort-of-usually-works-ok-for-most-cases. My primary goal is for it to be quick to respond to "provable" changes, because folks can always add more smoothing if they want on a per-application basis.

From my screenshot above, you can see that the blue and orange lines sort of "hug" the bottom of the yellow and teal lines, responding immediately to shorter readings, and slowly creeping up to meet the longer ones if they persist over time. I'm not anywhere near good enough at math to propose a proof, but eyeballing the graph looks "decent" to me, for what I want to see out of distance measurements. Actually I have been thinking I want to decrease the number of buckets as it hurts responsiveness a bit - but that's also related to having a good calibration (the only place this matters) because the velocity works over a range in "linear" space while rssi is a "logarithmic" space, so the scale of conversion matters in that instance. For reference my actual physical distance from camfront is about 0.5m and to prox-studio is about 2.8m, so those big distance changes would be a lot more subtle if I had calibrated my watch's ref_power more appropriately.

That said, the current Area algorithm is super basic and dumb. Every second, it looks to see which scanner has the closest distance measurement, and applies that. There is no built-in filtering on that because it would remove information by adding latency which for some applications is bad, and for other applications can be added with template sensors - and no one amount of smoothing would suit everyone, or every device, or every automation.

The number of buckets sort of affects (inversely) the steepness of the rise. There is another parameter max velocity which also plays a hand here, it pre-filters the smoothing bucket values by completely ignoring readings that would imply the device had moved away at more than walking speed. This is (probably?) why you see the smoothed curve climbing faster toward higher readings, but not making steep changes due to super-distant readings (like those huge variations from 01:06:50 on teal, above).

Anyway, all that to say.... your analysis of how the smoothing works is bang on (which is impressive, because it still does my head in sometimes), but I don't think it's the cause of your issue, being rapid area changes.

There could be improvements made in Bermuda to offer a control for applying a per-device area-smoothing setting, to control the trade-off between latency and stability, but I'd rather work on getting toward Trilateration, where we can compare the readings (and history) of all proxies to make decisions about where a device really is in relative space, rather than a simplistic binary room-by-room determination. I want to avoid spending time on things that can be done with the existing tools in HA (like template sensors) at the expense of progressing Bermuda's primary goals of trilateration, or improving stability, base UX etc. I hope that makes sense.

Answer 2 · 2024-11-12T18:05:04.000Z

Hey thank you very much for the detailed reply! I understood it completely and the template sensor for smoothing the area detection is a good idea that should work sufficiently.
Of course, I am eagerly awaiting the 'holy grail' of trilateration. So it's the right priority to focus on that.

Answer 3 · 2024-11-12T21:27:13.000Z

Hello agittins,

The problem with taking averages is that it includes noise into the measurement.
In signal processing I often use a median filter to elimiate extremes (its quite a standard filter for this kind of thing).
However, in your case, you are always interested in the closest readings as the more distant readings are likely to be noise/attenuation.

So have you tried applying a minimum filter to distance reading?

This would be done through taking the minimum value of a set of the last n readings (including readings where it was out of range) - this will always favour the closer readings through discarding the most extreme distances as likely noise/attenuation but it will slowly get larger if the device really is tending further away.

This seems like it would be very responsive filter for this application always favouring the closer readings as the real distance

Does that make sense?

Answer 4 · 2024-11-12T22:17:11.000Z

        local_min: float = self.rssi_distance_raw or DISTANCE_INFINITE
        list_min: float=min(self.hist_distance_by_interval)
        if list_min < local_min:
            local_min = list_min

        self.rssi_distance = local_min

Something like the above to replace lines 405 to 423 in bermuda_device_scanner.py (although I am not confident in Python so it might not be the exact syntax)

Answer 5 · 2024-11-13T13:15:36.000Z

wow thanks for this, I was having some issues with switching areas, now I know it's not because of me not understanding the settings.
did anyone make a good template sensor logic for this? can you please share?

Answer 6 · 2024-11-13T14:11:04.000Z

After agittins tip, I implemented such a template sensor (quick & dirty). It is based on a trigger-based logic and only switches the Area if the original Area entity stays the same for X seconds. Additionally I implemented a logic which "remembers" the last Area if the value switches to "unknown". I did this because my phone sometimes goes to sleep when it is not moved and the Bermuda Area thus becomes "unknown". In my opinion, it's still valid to keep the last known Area in this case, because the phone was not moved. The Area is only switched to "not_home" if Home Assistant knows that I am away from home.

- trigger:
  - platform: state
    entity_id: sensor.martin_iphone_ble_area
    to:
    for:
      seconds: 3

  sensor:
    - name: "Martin iPhone BLE Area Filtered"
      unique_id: "sensor.martin_iphone_ble_area_filtered"
      state: >
        {% if not is_state('person.martin','home') %}
          {{ 'not_home' }}
        {% else %}
          {{ states('sensor.martin_iphone_ble_area') if not is_state('sensor.martin_iphone_ble_area','unknown') else states('sensor.martin_iphone_ble_area_filtered') }}
        {% endif %}

It has room for optimization but even the three second window improved the Area switching drastically. I also had good results with 5 seconds.

What could be optimized?

Sometimes the value is constantly changing between two areas. In my implementation the Area "wins" which by chance keeps its value for the specified amount of time. A better approach could be to calculate some kind of "time average". Let's assume the entity constantly switches to "living room" for 2 seconds and switches back to "kitchen" for one second. In this case, it is more likely that I am in the living room, but if by chance "kitchen" would be randomly active for three seconds, the sensor would switch to this Area (not optimal). If it would be possible to evaluate which Area was active for the longest amount of time in a specified time window (e.g. 10 seconds), it could make sense to assume that I'm most likely in this place. But for this implementation, I'm lacking the skills. Also As soon as trilateration is implemented, all of this will become obsolete.

Answer 7 · 2024-11-13T14:20:19.000Z

Thanks a lot, need to find some time to implement this as well, switched back to the previous version for now which worked quite fine for me. There are quite a few automations depending on the correct area, so this was the quick fix for now.