google/temporian

New operator: z-score normalization

Opened this issue · 3 comments

New EventSet.z_score_normalize() (name TBD) operator.

See here for how to compute it.

See https://github.com/google/temporian/blob/main/CONTRIBUTING.md#developing-a-new-operator for guidance.

Questions or requests for additional guidance from possible contributors more than welcome!

Hey @ianspektor, I have a few questions about putting this into action:

Q1) Will this be a python-only operator or a c++ one?

Q2) As far as I understand, we can't use scipy. So, we can't call scipy.stats.zscore directly thus, I was wondering, do we keep the arguments same as scipy.stats.zscore ? Also, I'm interested in how we deal with NaNs .

Q3) What data types will this operator support? All numeric?

Tagging @javiber, he's the go-to person from now on for all things contributing :)

Hi @akshatvishu I think that we can implement this one using numpy's mean and std whiteout going down to c++.

Scipy's implementation for future reference: https://github.com/scipy/scipy/blob/v1.13.0/scipy/stats/_stats_py.py#L3021