skrub-data/skrub

InterpolationJoiner - polars

Closed this issue · 1 comments

Describe the bug

Tried a simple join as follows:

joiner = InterpolationJoiner(
data_store["depth_0"][0],
key=["case_id"],
suffix="_depth_0",
).fit(data_store["df_base"])
join = joiner.transform(data_store["df_base"])
join.head()

--

data_store["depth_0"][0] - polars Dataframe
data_store["df_base"] - polars Dataframe

--

Steps/Code to Reproduce

joiner = InterpolationJoiner(
data_store["depth_0"][0],
key=["case_id"],
suffix="_depth_0",
).fit(data_store["df_base"])
join = joiner.transform(data_store["df_base"])
join.head()

Expected Results

Wanted to see the join, as in: https://skrub-data.org/stable/auto_examples/09_interpolation_join.html

Actual Results


KeyError Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/polars/_utils/deprecation.py:95, in deprecate_parameter_as_positional..decorate..wrapper(*args, **kwargs)
94 try:
---> 95 param_args = kwargs.pop(old_name)
96 except KeyError:

KeyError: 'columns'

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
Cell In[8], line 5
1 joiner = InterpolationJoiner(
2 data_store["depth_0"][0],
3 key=["case_id"],
4 suffix="_depth_0",
----> 5 ).fit(data_store["df_base"])
6 join = joiner.transform(data_store["df_base"])
7 join.head()

File /opt/conda/lib/python3.10/site-packages/skrub/_interpolation_joiner.py:225, in InterpolationJoiner.fit(failed resolving arguments)
223 _join_utils.check_missing_columns(X, self.main_key, "'X' (the main table)")
224 key_values = self.vectorizer
.fit_transform(self.aux_table[self._aux_key])
--> 225 estimators = self._get_estimator_assignments()
226 fit_results = joblib.Parallel(self.n_jobs)(
227 joblib.delayed(_fit)(
228 key_values,
(...)
233 for assignment in estimators
234 )
235 fit_results = self._check_fit_results(fit_results)

File /opt/conda/lib/python3.10/site-packages/skrub/_interpolation_joiner.py:356, in InterpolationJoiner._get_estimator_assignments(self)
339 def _get_estimator_assignments(self):
340 """Identify column groups to be predicted together and assign them an estimator.
341
342 In many cases, a single estimator cannot handle all the target columns.
(...)
354 separately to each column.
355 """
--> 356 aux_table = self.aux_table.drop(self._aux_key, axis=1)
357 assignments = []
358 regression_table = aux_table.select_dtypes("number")

File /opt/conda/lib/python3.10/site-packages/polars/_utils/deprecation.py:97, in deprecate_parameter_as_positional..decorate..wrapper(*args, **kwargs)
95 param_args = kwargs.pop(old_name)
96 except KeyError:
---> 97 return function(*args, **kwargs)
99 issue_deprecation_warning(
100 f"named {old_name} param is deprecated; use positional *args instead.",
101 version=version,
102 )
103 if not isinstance(param_args, Sequence) or isinstance(param_args, str):

TypeError: DataFrame.drop() got an unexpected keyword argument 'axis'

Versions

'0.1.0'

thanks for reporting this bug. Indeed, InterpolationJoiner does not yet have support for polars, although that should be added soon. in the meanwhile it should be documented and provide a better error message

  • document the fact that interpolationjoiner is missing polars support ATM
  • add polars support