Decide on binning conventions for FLASC
paulf81 opened this issue ยท 6 comments
Description
I was hoping we could use this issue to make a decision on binning conventions in FLASC. Or maybe we have one already and I just need to stick to it! But just in case, I had a proposal that can be incorporated within PR #102 where we use the following conventions everywhere:
- Conventions apply to binning ws, wd, power, and time
- If you supply a minimum value, a maximum value, and a step size, the binning assumes you gave the left most and right most edge, and the labels will be the center of every bin (even in time)
- Or should time be an exception? If it is I prefer left-edge labels, although in FLASC I think we've used right-edge
- Every function that accepts a min, max and step should alternatively accept the bin_edges directly, but the bin_labels will yet be created as before
- bins will not be an input because it can be more ambiguous than bin_edges
- overlapped bins accomplished by reducing the bin size to some fraction of the original request and using a rolling operation instead of a groupby. Amount of overlap specified using an input like percentage overlap
Interested to hear what you all think!
Related URLs
No response
Thanks for starting this discussion! A lot of things have been implicit in the code and never properly formalized/thought out, so happy that we're doing it now.
Conventions apply to binning ws, wd, power, and time
Should we add TI as a variable to bin over?
If you supply a minimum value, a maximum value, and a step size, the binning assumes you gave the left most and right most edge, and the labels will be the center of every bin (even in time)
Agreed
Or should time be an exception? If it is I prefer left-edge labels, although in FLASC I think we've used right-edge
I don't have a preference here -- as long as we are consistent throughout.
Every function that accepts a min, max and step should alternatively accept the bin_edges directly, but the bin_labels will yet be created as before. bins will not be an input because it can be more ambiguous than bin_edges.
Agreed
overlapped bins accomplished by reducing the bin size to some fraction of the original request and using a rolling operation instead of a groupby. Amount of overlap specified using an input like percentage overlap
I like this. So it'll default to 1.0
to use it normally and users can set it to 10
for example to have a bin width of 30 deg for a wd bin step of 3 deg?
Thanks for these helpful responses @Bartdoekemeijer , @ejsimley I thought I'd just quick check if you any comment coming from conventions in openOA we can match? Otherwise I think we can press on, maybe start by opening an issue capturing these changes we intend to make
Thank you Bart! This is helpful responses! A few more from me,
Conventions apply to binning ws, wd, power, and time
Should we add TI as a variable to bin over?
I think I think of TI (and other things like OL for stability, or wd_var as ways you might bracket the data outside of the process, rather than controls within it)
If you supply a minimum value, a maximum value, and a step size, the binning assumes you gave the left most and right most edge, and the labels will be the center of every bin (even in time)
Agreed
Or should time be an exception? If it is I prefer left-edge labels, although in FLASC I think we've used right-edge
I don't have a preference here -- as long as we are consistent throughout.
I think I'm leaning here toward left edge labeling. I know it will change the convention in FLASC, and also won't have center everywhere, but left edge labeling is more comfortable for me. I think it also has the advantage that if you assume a bin is left-inclusive, right-exclusive, then the label is an included point.
Every function that accepts a min, max and step should alternatively accept the bin_edges directly, but the bin_labels will yet be created as before. bins will not be an input because it can be more ambiguous than bin_edges.
Agreed
overlapped bins accomplished by reducing the bin size to some fraction of the original request and using a rolling operation instead of a groupby. Amount of overlap specified using an input like percentage overlap
I like this. So it'll default to
1.0
to use it normally and users can set it to10
for example to have a bin width of 30 deg for a wd bin step of 3 deg?
Right. That could be a nice way to do it, something like bin_step 1 deg, num_bins_per_group (defaults to 1) but could be 10?
In the current version in in #107 I'm using a conventionn of specifying the radius past the edge of each bin to include in the binning,
In the current version in in #107 I'm using a conventionn of specifying the radius past the edge of each bin to include in the binning,
Sounds good to me!