skrub-data/skrub

Transforming auxiliary tables

Opened this issue · 0 comments

Problem Description

the Joiners join the main table (X) to an auxiliary table. The main table is being pushed through a pipeline and we can transform it however we want before and after the join.
We may need to perform transformations on the auxiliary table.
As a concrete example, applying minhash to the auxiliary table before joining it with the AggJoiner and the min aggregation function (a use case from @Vincent-Maladiere ). here the vectorization cannot be done in the "main" pipeline because it needs to happen before aggregation.

ATM this has to be done separately before creating the Joiner. It would be nice if those transformations could be packaged with the rest of the pipeline somehow. Moreover, if the aux table transformations have to be done "manually" outside of the main pipeline, we cannot do hyperparameter search for those transformations.

One way would be to have a aux_preprocessor parameter (passthrough by default) for the joiners.

Feature Description

_

Alternative Solutions

No response

Additional Context

No response