google/temporian

New operators: min, max

ianspektor opened this issue · 7 comments

New EventSet.max() and EventSet.min() operators.

They return a single-event EventSet with each feature's max value, associated to the last event's timestamp in the original EventSet.

To be implemented as aliases to evset.moving_max(np.inf).end() and evset.moving_min(np.inf).end().

See https://github.com/google/temporian/blob/main/CONTRIBUTING.md#developing-a-new-operator for guidance.

Questions or requests for additional guidance from possible contributors more than welcome!

Taking a look at this

Hey @SchadtJ! All yours :)

Note that not all steps mentioned in the "Developing a new operator" guide are necessary for these since they can be implemented as composition of existing operators (see description). I.e., you won't need to code actual Operator nor OperatorImplementation classes. See how set_index is defined in temporian/core/operators/add_index.py for an example of an operator defined this way.

Feel free to join our Discord for additional/faster help and answers: https://discord.gg/nT54yATCTy

Cheers!

@ianspektor Thank you!

End() does not seem to include the features for the last event, only the timestamp. Should I make a change so that the features are included?

The min/max of each feature are required for these new operators.

you're right :)

@achoum @javiber thoughts on making end() (and begin()) return feature values too instead of the timestamps only? don't think so but could this create future leakage?

Hey 😃,

Do we have any more info on the question above?

hi @SchadtJ Sorry for the late response, this issue flew under my radar after the handoff from Ian.

We discussed this internally and we think it's a better solution to calculate the min and max using the sampling parameter like this:

evset.moving_min(np.inf, sampling=evset.end())

With that said, we are unsure about the names of these operators, we feel that users might expect min/max to return just the python value instead of a single-event EventSet. I'll get back to you with a decision on changing the return type or renaming the operators (or leaving everything as defined originally) soon. You can start working on the implementation in the meantime if you want to or wait for the decision.
Also would love to hear your thoughts on this problem, do you think EventSet.max() should return an EventSet or the python value?

We decided to change the names to global_min and global_max to contrast moving min/max even more.

The implementation would be done with the moving_min and moving_max, sampling the end of the EvenSet as explained above.

Finally, this operator should return an EventSet this will make it consistent with other operators and preserve the indexes. We will, in another issue include an easy way to extract the Python value of single-event EventSets