VKCOM/statshouse

Events count over unique value

vkaverin opened this issue · 1 comments

Currently, there's no way to get "How many events were received with the same unique ID?". It would be nice to have all the same functions for that as now offered for value events.

In pseudocode it look like this:

select
  avg(count(unique_id)) as avg,
  min(count(unique_id)) as min,
  max(count(unique_id)) as max,
  p25(count(unique_id)) as p25,
  ...
from 
  my_metrics
group by
  unique_id

Use cases are like:

  • Collecting user views I want to see percentiles to tell how many items single user sees in averages/over percentiles.
  • Collecting API errors metrics I want to see how many errors single user receives.

As we only store an aggregate for cardinality estimation (think HyperLogLog) of unique values, but not the values themselves -- it is impossible to get back the values or any information tied to them.

If you want to distinguish individual values, you have to use a normal tag (or string-top _s tag if you are only interested in top values) for that. As always, beware of sampling if you data has very high cardinality.