GridProtectionAlliance/openHistorian

Add aggregation functions to Grafana interface

ritchiecarroll opened this issue · 6 comments

This should be done GSF code base.

But adding AVG, MIN, MAX, STDEV over a window would be useful.

Some additional aggregations to consider:

  • Total (sum)
  • Range (max - min)
  • Count (number of recorded values, helpful for detecting dropouts)
  • Time based integration
  • Derivative (change since last value, or change since last time slice)*
  • Sample standard deviation & Population standard deviation (or at least clarify which is used by the implemented stddev function)
    • popstddev
      image

    • sample stddev
      image

*Derivative is not a true aggregation function, but a manipulation of the stream of data. It may not belong in the same category, but I wanted to include it as it can be useful in many cases when looking at PMU or statistical data.

@EEParker - naturally all of these functions would reduce a per-point series of values down to a single value. I think we could easily start supporting such methods in expressions that like that might look like:

Max(PPA:12; PPA:14), StdDev(STAT:15; STAT:35), StdDevSamp(PPA:20), Range(PPA:19)

or

Count(FILTER ActiveMeasurements WHERE SignalType='VPHA'), TimeIntegration(FILTER ActiveMeasurements WHERE SignalType='VPHA')

With the caveat that, like in a SQL expression, all values must either be aggregated or not - and - in the case of filter expressions, all returned series based on the expression would be subject to the specified outer aggregation operand.

Would this properly fulfill the desired need?

I worry a little that people might be a little confused by:

Max(PPA:12; PPA:14)

Since this means return a max of `PPA:12' and a max of 'PPA:14' - same with filter expressions, you will get one value per series. But I guess they'd figure it out rather quickly.

Of course, not to confuse last point, but something like this might be interesting:

Max(Count(FILTER ActiveMeasurements WHERE SignalType='VPHA'))

In this case you receive a Count for each series value in expression - then return Max over all values in set.

The difference here being there would be "data" in the actual point ID returned, i.e., from call to call the value source could change.

Also, exactly what did you have in mind on time-based integration? I assume you mean an integral over series values which are already restricted to a given time range - but in this case are you wanting a versatile expression for the function(valueAtTimeT)?

I agree with the expectation that you will get one result per series. If I ask for MAX(PPA:1, PPA:2, PPA:3) where Timestamp between start and end, I would expect 3 separate results.

Some additional requirements for things like MAX would be to have the ability to get the timestamp of when the max occured vs the timestamp of the time range you are looking at. For other aggregations such as StdDev I think a timestamp at end time of the range would make more sense.

For time based integration, I was thinking of a simple integration of (value * seconds) for each value in a time range. A specific examples would be to calculate the MWh value from a MW signal. In reality this might require a further level of expression to calculated the MW from phasors, but that is another topic.

image
Highly simplified example:

// simple time integration
var values = GetValuesBetweenTimestamp(start, end, interval); //example query
double integrated_value = 0;

for (int ix = 1; ix < values.Length; ix++) 
  integrated_value += values[ix].Value * (values[ix].Timestamp - values[ix-1].Timestamp).TotalSeconds;
}

Will be available in next nightly build per:
GridProtectionAlliance/gsf@e3e9219