systematically handling column names and indexes of transformed dataframes
Opened this issue · 1 comments
jeromedockes commented
when we transform a dataframe we want to make sure that in the output
- the column names are always the same (and unique)
- if it is a pandas dataframe, the index is preserved
- possibly other checks performed by CheckInputDataframe
see for example this comment
I'm opening this now just so we don't forget about it
TheooJ commented
Agreed !
It would also be useful to check dataframe types between main
and aux
. For now, I believe only AggJoiner
checks that both have the same type in X, self._aux_table = self._check_dataframes(X, self.aux_table)
, but we probably want this in the other joiners too.
We could use something like:
self._aux_check_input = CheckInputDataFrame()
self._aux_table = self._aux_check_input.fit_transform(self.aux_table)
self._main_check_input = CheckInputDataFrame()
main = self._main_check_input.fit_transform(main)
if self._main_check_input.module_name_ != self._aux_check_input.module_name_:
...
For now,
- the
Joiner
usesCheckInputDataFrame
formain
andaux
, but doesn't check the type. - the
InterpolationJoiner
doesn't useCheckInputDataFrame
. Note that here,main
might not be known at the time of fitting.