ajcr/rolling

Handling of NaN

Opened this issue · 2 comments

Short question, is it somehow possible to extend this to handle NaN, like numpy nanmedian?

ajcr commented

Hi @kmuehlbauer, that's a good idea, I'll have to give some thought about how this could be implemented for each rolling iterator without affecting complexity.

For now, it should be straightforward to do this for some of the functions, just by using a generator with an appropriate fill-value. For example, Sum, filling NaN with 0:

>>> import math
>>> array = [1, 2, math.nan, 7, math.nan, 3, 2]
>>> array_fill_nan = (0 if math.isnan(x) else x for x in array) # generator, fills NaN values
>>> list(rolling.Sum(array_fill_nan, 3))
[3, 9, 7, 10, 5]

This approach doesn't work for Median however, as the fill value required at each step is not necessarily constant. I'll see whether adding support for missing values is feasible here. FWIW I think pandas just consider the whole window to be NaN if it contains at least one NaN value.

If your window size is small, rolling.Apply(array, window_size, operation=np.nanmedian)) should still be quite fast.

@ajcr Thanks for looking into this. I'll definitely try your suggestion using rolling.Apply(array, window_size, operation=np.nanmedian)).