pvlib/pvanalytics

Adding a cumulative energy check function

kperrynrel opened this issue · 5 comments

Many of the data streams I deal with on fleets are AC energy streams, and frequently they're cumulative and always increasing. I need to correct these streams by differencing them to make them look like normal data. I wrote a simple little function to check if the data is always increasing (I have a passable threshold of increasing 95% of the time), and difference the data if so. Here it is:

def cumulativeEnergyCheck(energy_series, pct_increase_threshold = 95):
    """
    Check if an energy time series represents cumulative energy or not.
    """
    differenced_series = energy_series.diff()
    differenced_series = differenced_series.dropna()
    if len(differenced_series) == 0:
        warnings.warn("The energy time series has a length of zero and "
                     "cannot be run.")
    else:
        #If over X percent of the data is increasing (set via the pct_increase_threshold), 
        #then assume that the column is cumulative 
        differenced_series_positive_mask = (differenced_series >= -.5)
        pct_over_zero = differenced_series_positive_mask.value_counts(normalize=True) * 100
        if pct_over_zero[True] >= pct_increase_threshold:
            energy_series = energy_series.diff()
            cumulative_energy = True
        else:
            cumulative_energy = False
    return energy_series, cumulative_energy

I'd like to adapt this and add it into PVAnalytics. @cwhanse and @kanderso-nrel what do you think?

I think inferring whether an energy series is cumulative is unrelated enough from estimating the corresponding interval energy series that they should be implemented in separate functions. I also think properly calculating the interval energy corresponding to a cumulative energy series can sometimes be complicated, or at least more complicated than just energy.diff() anyway (happy to discuss more if you want).

I'm also a little unhappy about that magic -.5 value. Cumulative inverter time series typically have 0.0 at night, but system meter data may or may not include the effect of nighttime system self-consumption and can tick downwards at a unit- and timescale-dependent rate that may or may not be allowed by a hardcoded -.5. I suggest an optional parameter for that value rather than hardcoding it.

What module would this function be added to?

@kanderso-nrel I'm in agreement that we should turn this into two functions: one for determining if stream is cumulative, and one for correcting if stream is cumulative and we don't want it to be. I'm totally open to other suggestions than the .diff() option, which was my quick-and-dirty implementation in fleets. I do like the idea of making that -.5 passable, it was empirically derived after I checked a bunch of data sets but shouldn't be set in stone.

I'm thinking this doesn't actually fit in an existing module right now, so we'd have to make one for it. Say energy.py in the quality folder?

Here's a short example of appropriate treatment of two ways cumulative energy is often reported: https://gist.github.com/kanderso-nrel/672763bb0dc8c23432a2072ba011ce9f