Issue with explain method

Question

Issue with explain method

5agado opened this issue 4 years ago · 3 comments

I trained an Imputer model with a mix of categorical, numerical and bow encoder (and associated featurizers), but when I run the explain method on it I get this error:

~/opt/anaconda3/envs/extract/lib/python3.6/site-packages/datawig/imputer.py in explain(self, label, k, label_column)
    390         # for each data encoder extract (token_idx, token_idx_correlation_with_label), extract and apply idx2token map.
    391         feature_dict = dict(explained_label = label)
--> 392         for encoder, pattern in self.__class_patterns:
    393             # extract idx2token mappings
    394             if isinstance(encoder, CategoricalEncoder):

TypeError: 'NoneType' object is not iterable

I tried with a dummy setup and explain works, so I would like to know if you have any clue about what is exactly causing this in my more complex model

Answer 1 · 2020-11-02T13:27:00.000Z

How did you train the imputer? If you used the SimpleImputer, make sure you set the is_explainable flag in the constructor, if you're using the Imputer, that flag is set automatically, depending on the featurizers provided.

Answer 2 · 2020-11-02T13:29:40.000Z

I used the full Imputer, and was expecting to get explanation for all the input categorical columns

Answer 3 · 2020-11-02T13:50:13.000Z

We only implemented a very simple explanation method (which in our experience works - on simple tasks - better and considerably faster than other methods, including LIME, see some of our experiments here). For the Imputer this method currrently only works with Categorical and Tfidf inputs with a single categorical output column.

What works well in most cases we've encountered (in terms of imputation/classification performance) is to concatenate all categorical and text columns (maybe with prefixes specific to each column) on the input side into one text column and use a TfIdf vectorizer with char-ngrams, this is what SimpleImputer does by default.

Closing this for now, feel free to reopen if the explanation does not work with the encoders it should work with.