Another github repo that shows different interpretation methods on image classification: (https://github.com/albermax/innvestigate)
Interpretation for NLP
- Towards a Deep and Unified Understanding of Deep Neural Models in NLP (ICML19)
- Interpretable Adversarial Perturbation in Input Embedding Space for Text (IJCAI18)
- Attention is not Explanation (NAACL19)
- Attention is not not Explanation (EMNLP19)
- Is Attention Interpretable? (ACL19)
Interpretation & Adversary
- Interpretation of Neural Networks is Fragile (AAAI19)
- Interpreting Adversarially Trained Convolutional Neural Networks (ICML19)