hila-chefer/Transformer-Explainability

Threshold for Attention Vector during comparison

jaydebsarker opened this issue · 2 comments

I am adding two lines here from : https://github.com/hila-chefer/Transformer-Explainability/blob/main/BERT_explainability.ipynb

i) expl = explanations.generate_LRP(input_ids=input_ids, attention_mask=attention_mask, start_layer=0)[0]

normalize scores

ii) expl = (expl - expl.min()) / (expl.max() - expl.min())

You normalized the explanation vector score in line ii) here. After the normalization, the most significant one/more got a score of 1.0./-1.0. Other than that, some other tokens may get 0.7 or 0.6, etc. In that case, which tokens are considered as predicted class (i.e., negative sentiment) from the model output? To be specific, did you put a threshold (i.e., >=0.5) for each token to align a specific class?

I was wondering if you would clarify this for me.

Hi @jaydebsarker, thanks for your interest!
I apologize for the delay in my response.
The normalization sets the values between 0 and 1 for visualization purposes.
I do not set a threshold, but if you wish to set one, I recommend using Otsu’s method for that (as done in our second paper).

Best,
Hila.

Hi @hila-chefer ,

Thank you so much for the reference for your paper.

Best,
Jaydeb