awslabs/datawig

Error running simpleimputer_intro.py in the example

CLWcynthia opened this issue · 11 comments

When I ran the simpleimputer_intro.py in the example, the following error occurred
Traceback (most recent call last):File "/Users/chen/PycharmProjects/test2/datamissing/examples/simpleimputer_intro.py", line 41, in <module> predictions = imputer.predict(df_test)
File "/usr/local/lib/python3.7/site-packages/datawig/simple_imputer.py", line 420, in predict score_suffix, inplace=inplace)
File "/usr/local/lib/python3.7/site-packages/datawig/imputer.py", line 822, in predict if data_frame.columns.contains(imputation_col):
AttributeError: 'Index' object has no attribute 'contains'
It could be a data processing error in predict function

this was due to a backwards-incompatible API change in pandas 1.0 and was addressed in this PR - sorry that we did not get to release a new version yet.

A quick fix could be to pip install pandas==0.25.0 manually before installing datawig.

In a jupyter notebook this could be done like:

!pip install pandas==0.25.0
!pip install datawig

Does this solve the problem?

It works for the moment,but there are warnings about the future,I find it feasible to modify the following two lines in imputer.py
if data_frame.columns.str.contains(imputation_col).any():in line 822
if data_frame.columns.str.contains(imputation_proba_col).any():in line 829

You're right, and we've fixed those lines in the last PR i had mentioned earlier, in particular the lines you mentioned are compliant with the new pandas API, see for instance here.

While the source code is fixed since some time that commit is not released in pip yet, we'll make sure that this and and some other mxnet related fix will be released asap.

Thanks for noticing this!

Should be solved with latest release, please reopen if problem persists

Hey, @felixbiessmann

It seems that this issue still persists. I get the initial error when trying to do:

imputer = datawig.SimpleImputer(
    input_columns = ["advice", "reason", "reason_id"],
    output_column = "advice_id"
)

imputer.fit(train_df = df3_train)
predictions = imputer.predict(df3_test)

Followed by error:

AttributeError                            Traceback (most recent call last)
<ipython-input-56-89a3fce1d6ab> in <module>
----> 1 predictions = imputer.predict(df3_test)

~/opt/anaconda3/lib/python3.8/site-packages/datawig/simple_imputer.py in predict(self, data_frame, precision_threshold, imputation_suffix, score_suffix, inplace)
    417         :return: data_frame original dataframe with imputations and likelihood in additional column
    418         """
--> 419         imputations = self.imputer.predict(data_frame, precision_threshold, imputation_suffix,
    420                                            score_suffix, inplace=inplace)
    421 

~/opt/anaconda3/lib/python3.8/site-packages/datawig/imputer.py in predict(self, data_frame, precision_threshold, imputation_suffix, score_suffix, inplace)
    820         for label, imputations in predictions:
    821             imputation_col = label + imputation_suffix
--> 822             if data_frame.columns.contains(imputation_col):
    823                 raise ColumnOverwriteException(
    824                     "DataFrame contains column {}; remove column and try again".format(

AttributeError: 'Index' object has no attribute 'contains'

Following the advice from the thread above, I attempt to do locally in a Notebook:

!pip install pandas==0.25.0
!pip install datawig

However that Pandas version seems to fail when installing as it seems to have been deprecated. Is there another solution to get around this? Thanks!

Hey,

thanks for the heads up, we're currently in the process of refactoring the package and there's a pending PR that should solve some of these Problems - but it's in a preliminary stage. Until the next release I'd recommend to use the old package versions.

Thanks
Felix

Okay, thank you @felixbiessmann - in the meantime, do you recommend a particular old package version?

Same problem. I am facing too :(

I am still facing the same issue. Any advice?

Hi,

Is there an update on this issue? I am facing the same issue -
AttributeError: 'Index' object has no attribute 'contains'

I have pandas 1.2.4 and going back to 0.25.0 is not an option since it's been deprecated.