awslabs/datawig

Issue with explain method

5agado opened this issue · 3 comments

I trained an Imputer model with a mix of categorical, numerical and bow encoder (and associated featurizers), but when I run the explain method on it I get this error:

~/opt/anaconda3/envs/extract/lib/python3.6/site-packages/datawig/imputer.py in explain(self, label, k, label_column)
    390         # for each data encoder extract (token_idx, token_idx_correlation_with_label), extract and apply idx2token map.
    391         feature_dict = dict(explained_label = label)
--> 392         for encoder, pattern in self.__class_patterns:
    393             # extract idx2token mappings
    394             if isinstance(encoder, CategoricalEncoder):

TypeError: 'NoneType' object is not iterable

I tried with a dummy setup and explain works, so I would like to know if you have any clue about what is exactly causing this in my more complex model

How did you train the imputer? If you used the SimpleImputer, make sure you set the is_explainable flag in the constructor, if you're using the Imputer, that flag is set automatically, depending on the featurizers provided.

I used the full Imputer, and was expecting to get explanation for all the input categorical columns

We only implemented a very simple explanation method (which in our experience works - on simple tasks - better and considerably faster than other methods, including LIME, see some of our experiments here). For the Imputer this method currrently only works with Categorical and Tfidf inputs with a single categorical output column.

What works well in most cases we've encountered (in terms of imputation/classification performance) is to concatenate all categorical and text columns (maybe with prefixes specific to each column) on the input side into one text column and use a TfIdf vectorizer with char-ngrams, this is what SimpleImputer does by default.

Closing this for now, feel free to reopen if the explanation does not work with the encoders it should work with.