1、对抗攻击和鲁棒性

2023

【ACL】

  • How do humans perceive adversarial text? A reality check on the validity and naturalness of word-based adversarial attacks[PDF]

  • Randomized Smoothing with Masked Inference for Adversarially Robust Text Classifications[PDF]

  • Text Adversarial Purification as Defense against Adversarial Attacks[PDF] (防御)

  • White-Box Multi-Objective Adversarial Attack on Dialogue Generation[PDF]

  • Contrastive Learning with Adversarial Examples for Alleviating Pathology of Language Model[PDF]

【AAAI】

  • SSPAttack: A Simple and Sweet Paradigm for Black-Box Hard-Label Textual Adversarial Attack (同义词替换)

2022

【ACL】

  • Adversarial Authorship Attribution for Deobfuscation[PDF][Code]

  • Adversarial Soft Prompt Tuning for Cross-Domain Sentiment Analysis[PDF]

  • Flooding-X: Improving BERT’s Resistance to Adversarial Attacks via LossRestricted Fine-Tuning[PDF]

  • Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost[PDF][Code]

  • ParaDetox: Detoxification with Parallel Data[PDF][Code]

  • Pass off Fish Eyes for Pearls: Attacking Model Selection of Pre-trained Models[PDF][Code]

  • SHIELD: Defending Textual Neural Networks against Multiple Black-Box[PDF][Code]

  • Towards Robustness of Text-to-SQL Models Against Natural and Realistic Adversarial Table Perturbation[PDF][Code]

【EMNLP】

  • Character-level White-Box Adversarial Attacks against Transformers via Attachable Subwords Substitution[PDF][Code]
  • (双答案句子攻击问答模型) TASA: Deceiving Question Answering Models by Twin Answer Sentences Attack[PDF][Code]
  • Textual Manifold-based Defense Against Natural Language Adversarial Examples[PDF][Code]
  • Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial NLP[PDF]

【COLING】

  • Semantic-Preserving Adversarial Code Comprehension[PDF]
  • PARSE: An Efficient Search Method for Black-box Adversarial Text Attacks[PDF]
  • PAEG: Phrase-level Adversarial Example Generation for Neural Machine Translation[PDF]
  • (最小删除引起的神经机器翻译错误)Rare but Severe Neural Machine Translation Errors Induced by Minimal Deletion: An Empirical Study on Chinese and English[PDF]

【NAACL】

  • ValCAT: Variable-Length Contextualized Adversarial Transformations Using Encoder-Decoder Language Model[PDF]
  • SHARP: Search-Based Adversarial Attack for Structured Prediction
  • Phrase-level Textual Adversarial Attack with Label Preservation
  • Adversarial Text Normalization[PDF]

【AAAI】

  • Word Level Robustness Enhancement: Fight Perturbation with Perturbation

2021

【ACL】

  • Improving Gradient-based Adversarial Training for Text Classification by Contrastive Learning and Auto-Encoder[PDF]

  • Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble

  • A Sweet Rabbit Hole by DARCY: Using Honeypots to Detect Universal Trigger’s Adversarial Attacks

  • Crafting Adversarial Examples for Neural Machine Translation

  • Adversarial Learning for Discourse Rhetorical Structure Parsing

  • Reliability Testing for Natural Language Processing Systems

  • Robust Knowledge Graph Completion with Stacked Convolutions and a Student Re-Ranking Network

  • Towards Robustness of Text-to-SQL Models against Synonym Substitution

  • Improving Paraphrase Detection with the Adversarial Paraphrasing Task

  • MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation

  • On the Efficacy of Adversarial Data Collection for Question Answering: Results from a Large-Scale Randomized Study

  • WARP: Word-level Adversarial ReProgramming

  • Improving Arabic Diacritization with Regularized Decoding and Adversarial Training

  • An Empirical Study on Adversarial Attack on NMT: Languages and Positions Matter

  • Using Adversarial Attacks to Reveal the Statistical Bias in Machine Reading Comprehension Models

  • OutFlip: Generating Examples for Unknown Intent Detection with Natural Language Attack

【EMNLP】

  • Achieving Model Robustness through Discrete Adversarial Training[PDF]
  • Multi-granularity Textual Adversarial Attack with Behavior Cloning[PDF]
  • (评估神经语言模型对输入干扰的鲁棒性)Evaluating the Robustness of Neural Language Models to Input Perturbations[PDF]
  • (针对跨语言知识图谱对齐的对抗性攻击)Adversarial Attack against Cross-lingual Knowledge Graph Alignment
  • (针对输入鲁棒性的基于特征的对抗元嵌入)FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations
  • (通过实例归因方法对知识图谱嵌入的对抗性攻击)Adversarial Attacks on Knowledge Graph Embeddings via Instance Gradient-based Adversarial Attacks against Text TransformersAttribution Methods
  • (黑盒环境中查询高效攻击的强大基线)A Strong Baseline for Query Efficient Attacks in a Black Box Setting
  • Gradient-based Adversarial Attacks against Text Transformers[PDF]
  • Searching for an Effective Defender: Benchmarking Defense against Adversarial Word Substitution[PDF]
  • On the Transferability of Adversarial Attacks against Neural Text Classifier[PDF]
  • Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification[PDF]

【NAACL】

  • Universal Adversarial Attacks with Natural Triggers for Text Classification
  • Contextualized Perturbation for Textual Adversarial Attack

【AAAI】

  • Generating Natural Language Attacks in a Hard Label Black Box Setting
  • (使用快速渐变投影方法对抗基于同义词替换的文本攻击的对抗训练)Adversarial Training with Fast Gradient Projection Method against Synonym Substitution Based Text Attacks
  • (通过自动生成的反事实对文本分类中的虚假相关性进行鲁棒性)Robustness to Spurious Correlations in Text Classification via Automatically Generated Counterfactuals

1.2 神经机器翻译

2022

【NAACL】

  • Generating Authentic Adversarial Examples beyond Meaning-preserving with Doubly Round-trip Translation[PDF]

1.3 情绪分类

2021

【NAACL】

  • Grey-box Adversarial Attack And Defence For Sentiment Classification

2、对抗攻击的应用

2022

【NAACL】

  • (一句话值一千美元:对推文的对抗性攻击傻瓜股票预测)A Word is Worth A Thousand Dollars: Adversarial Attack on Tweets Fools Stock Prediction[PDF]

2021

【NAACL】

  • Dynamically Disentangling Social Bias from Task-Oriented Representations with Adversarial Attack
  • BBAEG: Towards BERT-based Biomedical Adversarial Example Generation for Text Classification

3、对抗攻击的检测和防御

2022

【NAACL】

  • (不为小事操心,只为其他事分类: 样本屏蔽保护文本分类器免受对抗性攻击)Don’t sweat the small stuff, classify the rest: Sample Shielding to protect text classifiers against adversarial attacks[PDF]
  • Residue-Based Natural Language Adversarial Attack Detection[PDF]
  • Self-Supervised Contrastive Learning with Adversarial Perturbations for Defending Word Substitution-based Attacks[PDF]

【AAAI】

  • Improved Text Classification via Contrastive Adversarial Training
  • KATG: Keyword-Bias-Aware Adversarial Text Generation for Text Classification

4、nlp的可解释性与分析

2023

【ACL】

  • Entity Tracking in Language Models

5、鲁棒性提高公平性

2021

【ACL】

  • Does Robustness Improve Fairness? Approaching Fairness with Word Substitution Robustness Methods for Text Classification[PDF]

中文

2021

【ACL】

  • Correcting Chinese Spelling Errors with Phonetic Pre-training
  • Dynamic Connected Networks for Chinese Spelling Check

关于语言模型的攻防

2023

【ACL】

  • Language model acceptability judgements are not always robust to context[PDF]

2021

【ACL】

  • BERT-Defense: A Probabilistic Model Based on BERT to Combat Cognitively Inspired Orthographic Adversarial Attacks[PDF]
  • Defending Pre-trained Language Models from Adversarial Word Substitutions Without Performance Sacrifice[PDF]

优化器

2023

【ACL】

  • CAME: Confidence-guided Adaptive Memory Efficient Optimization[PDF]Code]

分类器

2023

【ACL】

  • Linear Classifier: An Often-Forgotten Baseline for Text Classification