jind11/TextFooler

Quality of adversaries and authenticity of results

SachJbp opened this issue · 4 comments

There seems to be a issue in a few adversaries.

For example: A claimed adversary from mr_bert.txt is:
orig sent (0): to portray modern women the way director davis has done is just unthinkable
adv sent (1): to portray modern women the way director davis has done is just imaginable

unthinkable and imaginable are antonyms which erroneously have high cosine similarity suggesting that those are synonyms. I suggest such examples should not be considered while evaluating the success rate of attack, as the human evaluation would clearly label it as positive (1) and not negative.

Yes, the human evaluation on polarity is not 100% due to these errors.

The ~13% after-attack accuracy reported considers such examples as success , which actually is not. I guess Human evaluation filter should finally govern the after-attack accuracy. Please correct me if I am wrong. Thanks.

Where is the emdding.npz file, please? Or how is it generated?
3b8380b89d2686d6cb586f83719ca03
7a678cd5f2a8398b7980d8aaa9d5aec