google/temporian

New operators: string operations

Opened this issue · 7 comments

New operators to operate on strings:

  • add
  • multiply
  • lower
  • upper
  • str_len

Implementation should be factored with base classes so that each operation only needs to define the np.char method it needs to call.

See https://github.com/google/temporian/blob/main/CONTRIBUTING.md#developing-a-new-operator for guidance.

Questions or requests for additional guidance from possible contributors more than welcome!

@ianspektor I will take this next, can you please assign this to me?

@tanaysd for sure! Thanks :)

Some notes:

  • Read and pay close attention to the operator development guide (link in the issue's description)
  • Check how logical operators (i.e. and, or and xor) are implemented for reference, since those only accept and return bool types, which is similar to these that accept and return strings
    • Key files: temporian/core/operators/binary/logical.py, temporian/implementation/numpy/operators/binary/logical.py, temporian/core/operators/test/test_logical.py
    • Although note that these string ops shouldn't modify the feature names they return (logical ops do because they receive 2 EventSets)
    • Please make tests a bit more thorough than the ones in test_logical, which only test success cases (e.g. test that the expected error is returned when passing an incorrect type)
  • Make sure you add the new operators to the docs (as described in the guide) and you can see them when running the docs locally

Also, let's start out with just 2 of the operations in the list (let's say upper() and lower()) and only implement the remaining ones after we've validated the design, so you don't need to be making changes to all 9 on every comment, if that sounds alright

Sounds good, thank you, can you share the link to discord server when you've a chance?

Hi @ianspektor

  1. should string operators get their own folder similar to calendar

found this snippet at the end of the instructions Groups of operator with a similar implementation as grouped together. For instance, temporian/core/operators/window contains moving window operators

  1. in your last comment you mention 9 but the original description lists five operators, which one is correct?

It's 5, sorry for the confusion, there were some other ones that I moved to this issue since they need to be extended from existing functions to work with strings (vs. created from scratch like these): #369