Running IntegratedGradients with text input
Closed this issue · 6 comments
Hi authors,
Thanks a lot for the integrated gradients technique and code. I am trying to use this for a text classification task that I am working on. Unfortunately, the tf.gradient function returns None while trying to get the gradient op of output w.r.t. input. After spending a fair amount of time trying to debug it, I realized that the input layer for me is just the ids of the words which are converted to word vectors after being passed through an embedding layer. However in the paper, results have been reported for text inputs too. Hence I wanted to confirm that this is the issue before changing the implementation to handle text inputs? Thanks in advance for your help.
Hi,
Just to be clear, I am not the author of the paper. I implemented this code to use in one of my projects.
Anyways, to answer your question, yes, this model should work on text inputs. The reason why you are getting "None" from tf.gradients() is because your input is discrete and Keras embedding layer is non-differentiable w.r.t data. However, you should be able to differentiate w.r.t the vector representations of the input words.
I'm also finding a similar issue and wondering the best way to approach this.
e.g. if I wanted to use a model such as:
model = Sequential()
model.add(Embedding(1000, 100, input_length=200))
model.add(LSTM(100))
model.add(Dense(1, activation='sigmoid'))
Do you know the easiest way to modify the IG implementation to support attributions on such a network? @aman313, have you been able to fix this yet?
Thanks!
@b-carter Well, it ain't very pretty, but I hacked around this simply by extending the model class to include a property which I then set to point to the output tensors of the embedding layer. Then I pass this as the second parameter in get_gradients call in IG code. Perhaps a better way would be to name the layer and use keras in-built functions to get output of the layer by name. Cheers.
Thanks! I'm trying to get that to work. Did you have to make any changes down by the for
loop on lines 141-143? I'm getting a broadcast error from the np.multiply
call down there.
For reference, I had changed the second param in the get_gradients call to self.model.layers[1].output
(where layers[1] is the embeddings layer).
@aman313 - I am not sure I understand how you did that? If we get gradients for embedding layer, how do you attribute it back to each input? Could you point me to the changes you have made? Thanks
An embedding layer is mathematically equivalent to having a dense layer on 1-hot encoded inputs. People use it to save space because you cannot possible fit all input samples when you have millions of words that you need to 1-hot encode.
Each row of dense weights corresponds to an embedded vector in an embedding layer. So instead what you can do is to take partial gradient with respect to an input of 1 at the first embedding layer, and you should get what you want.