This is a list of BERT-related papers. Any feedback is welcome.
- Downstream task
- Generation
- Modification (multi-task, masking strategy, etc.)
- Probe
- Inside BERT
- Multi-lingual
- Domain specific
- Multi-modal
- Model compression
- Misc.
- A BERT Baseline for the Natural Questions
- MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension (ACL2019)
- A Multi-Type Multi-Span Network for Reading Comprehension that Requires Discrete Reasoning (EMNLP2019)
- SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering
- Multi-hop Question Answering via Reasoning Chains
- End-to-End Open-Domain Question Answering with BERTserini (NAALC2019)
- Latent Retrieval for Weakly Supervised Open Domain Question Answering (ACL2019)
- Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering (EMNLP2019)
- Learning to Ask Unanswerable Questions for Machine Reading Comprehension (ACL2019)
- Unsupervised Question Answering by Cloze Translation (ACL2019)
- Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation
- Enhancing Pre-Trained Language Representations with Rich Knowledge for Machine Reading Comprehension (ACL2019)
- Incorporating Relation Knowledge into Commonsense Reading Comprehension with Multi-task Learning (CIKM2019)
- SG-Net: Syntax-Guided Machine Reading Comprehension
- MMM: Multi-stage Multi-task Learning for Multi-choice Reading Comprehension
- A Simple but Effective Method to Incorporate Multi-turn Context with BERT for Conversational Machine Comprehension (ACL2019 WS)
- FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension (ACL2019 WS)
- BERT with History Answer Embedding for Conversational Question Answering (SIGIR2019)
- GraphFlow: Exploiting Conversation Flow with Graph Neural Networks for Conversational Machine Comprehension (ICML2019 WS)
- Beyond English-only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for Bulgarian (RANLP2019)
- Cross-Lingual Machine Reading Comprehension (EMNLP2019)
- Zero-shot Reading Comprehension by Cross-lingual Transfer Learning with Multi-lingual Language Representation Model
- Multilingual Question Answering from Formatted Text applied to Conversational Agents
- BiPaR: A Bilingual Parallel Dataset for Multilingual and Cross-lingual Reading Comprehension on Novels (EMNLP2019)
- Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension (EMNLP2019)
- BERT for Joint Intent Classification and Slot Filling
- Multi-lingual Intent Detection and Slot Filling in a Joint BERT-based Model
- BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer (Interspeech2019)
- Dialog State Tracking: A Neural Reading Comprehension Approach
- Domain Adaptive Training BERT for Response Selection
- Fine-grained Information Status Classification Using Discourse Context-Aware Self-Attention
- GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge (EMNLP2019)
- Improved Word Sense Disambiguation Using Pre-Trained Contextualized Word Representations (EMNLP2019)
- Using BERT for Word Sense Disambiguation
- Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence (NAACL2019)
- BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis (NAACL2019)
- Exploiting BERT for End-to-End Aspect-based Sentiment Analysis (EMNLP2019 WS)
- Adapt or Get Left Behind: Domain Adaptation through BERT Language Model Finetuning for Aspect-Target Sentiment Classification
- An Investigation of Transfer Learning-Based Sentiment Analysis in Japanese (ACL2019)
- "Mask and Infill" : Applying Masked Language Model to Sentiment Transfer
- Neural Aspect and Opinion Term Extraction with Mined Rules as Weak Supervision (ACL2019)
- Assessing BERT’s Syntactic Abilities
- Does BERT agree? Evaluating knowledge of structure dependence through agreement relations
- Simple BERT Models for Relation Extraction and Semantic Role Labeling
- A Simple BERT-Based Approach for Lexical Simplification
- Multi-headed Architecture Based on BERT for Grammatical Errors Correction (ACL2019 WS)
- KG-BERT: BERT for Knowledge Graph Completion
- Language Models as Knowledge Bases? (EMNLP2019) [github]
- BERT-Based Arabic Social Media Author Profiling
- BERT Meets Chinese Word Segmentation
- Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning
- Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT
- Evaluating Contextualized Embeddings on 54 Languages in POS Tagging, Lemmatization and Dependency Parsing
- NEZHA: Neural Contextualized Representation for Chinese Language Understanding
- Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing -- A Tale of Two Parsers Revisited (EMNLP2019)
- Cross-Lingual BERT Transformation for Zero-Shot Dependency Parsing
- Named Entity Recognition -- Is there a glass ceiling? (CoNLL2019)
- Portuguese Named Entity Recognition using BERT-CRF
- Resolving Gendered Ambiguous Pronouns with BERT (ACL2019 WS)
- Anonymized BERT: An Augmentation Approach to the Gendered Pronoun Resolution Challenge (ACL2019 WS)
- Gendered Pronoun Resolution using BERT and an extractive question answering formulation (ACL2019 WS)
- MSnet: A BERT-based Network for Gendered Pronoun Resolution (ACL2019 WS)
- Fill the GAP: Exploiting BERT for Pronoun Resolution (ACL2019 WS)
- On GAP Coreference Resolution Shared Task: Insights from the 3rd Place Solution (ACL2019 WS)
- Look Again at the Syntax: Relational Graph Convolutional Network for Gendered Ambiguous Pronoun Resolution (ACL2019 WS)
- BERT Masked Language Modeling for Co-reference Resolution (ACL2019 WS)
- BERT for Coreference Resolution: Baselines and Analysis (EMNLP2019) [github]
- WikiCREM: A Large Unsupervised Corpus for Coreference Resolution (EMNLP2019)
- Matching the Blanks: Distributional Similarity for Relation Learning (ACL2019)
- BERT-Based Multi-Head Selection for Joint Entity-Relation Extraction (NLPCC2019)
- Enriching Pre-trained Language Model with Entity Information for Relation Classification
- Span-based Joint Entity and Relation Extraction with Transformer Pre-training
- Fine-tune Bert for DocRED with Two-step Process
- Fine-tuning BERT for Joint Entity and Relation Extraction in Chinese Medical Text
- How to Fine-Tune BERT for Text Classification?
- X-BERT: eXtreme Multi-label Text Classification with BERT
- DocBERT: BERT for Document Classification
- Enriching BERT with Knowledge Graph Embeddings for Document Classification
- Classification and Clustering of Arguments with Contextualized Word Embeddings (ACL2019)
- BERT for Evidence Retrieval and Claim Verification
- Conditional BERT Contextual Augmentation
- Exploring Unsupervised Pretraining and Sentence Structure Modelling for Winograd Schema Challenge
- A Surprisingly Robust Trick for the Winograd Schema Challenge
- Improving Natural Language Inference with a Pretrained Parser
- HellaSwag: Can a Machine Really Finish Your Sentence? (ACL2019) [website]
- Story Ending Prediction by Transferable BERT (IJCAI2019)
- Explain Yourself! Leveraging Language Models for Commonsense Reasoning (ACL2019)
- Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models
- Informing Unsupervised Pretraining with External Linguistic Knowledge
- Commonsense Knowledge + BERT for Level 2 Reading Comprehension Ability Test
- HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization (ACL2019)
- Deleter: Leveraging BERT to Perform Unsupervised Successive Text Compression
- Passage Re-ranking with BERT
- Investigating the Successes and Failures of BERT for Passage Re-Ranking
- Understanding the Behaviors of BERT in Ranking
- Document Expansion by Query Prediction
- CEDR: Contextualized Embeddings for Document Ranking (SIGIR2019)
- Deeper Text Understanding for IR with Contextual Neural Language Modeling (SIGIR2019)
- FAQ Retrieval using Query-Question Similarity and BERT-Based Query-Answer Relevance (SIGIR2019)
- BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model (NAACL2019 WS)
- Pretraining-Based Natural Language Generation for Text Summarization
- Text Summarization with Pretrained Encoders (EMNLP2019)
- Multi-stage Pretraining for Abstractive Summarization
- MASS: Masked Sequence to Sequence Pre-training for Language Generation (ICML2019) [github], [github]
- Unified Language Model Pre-training for Natural Language Understanding and Generation (NeurIPS2019)
- Towards Making the Most of BERT in Neural Machine Translation
- Improving Neural Machine Translation with Pre-trained Representation
- On the use of BERT for Neural Machine Translation
- Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
- Mask-Predict: Parallel Decoding of Conditional Masked Language Models (EMNLP2019)
- Multi-Task Deep Neural Networks for Natural Language Understanding (ACL2019)
- BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning (ICML2019)
- Unifying Question Answering and Text Classification via Span Extraction
- ERNIE: Enhanced Language Representation with Informative Entities (ACL2019)
- ERNIE: Enhanced Representation through Knowledge Integration
- ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
- SpanBERT: Improving Pre-training by Representing and Predicting Spans [github]
- RoBERTa: A Robustly Optimized BERT Pretraining Approach [github]
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
- KERMIT: Generative Insertion-Based Modeling for Sequences
- DisSent: Sentence Representation Learning from Explicit Discourse Relations (ACL2019)
- StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
- SenseBERT: Driving Some Sense into BERT
- Semantics-aware BERT for Language Understanding
- K-BERT: Enabling Language Representation with Knowledge Graph
- Knowledge Enhanced Contextual Word Representations (EMNLP2019)
- Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks (EMNLP2019)
- Symmetric Regularization based BERT for Pair-wise Semantic Reasoning
- Transfer Fine-Tuning: A BERT Case Study (EMNLP2019)
- Improving Pre-Trained Multilingual Models with Vocabulary Expansion (CoNLL2019)
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
- SesameBERT: Attention for Anywhere
- A Structural Probe for Finding Syntax in Word Representations (NAACL2019)
- Linguistic Knowledge and Transferability of Contextual Representations (NAACL2019) [github]
- Probing What Different NLP Tasks Teach Machines about Function Word Comprehension (*SEM2019)
- BERT Rediscovers the Classical NLP Pipeline (ACL2019)
- Probing Neural Network Comprehension of Natural Language Arguments (ACL2019)
- Cracking the Contextual Commonsense Code: Understanding Commonsense Reasoning Aptitude of Deep Contextual Representations (EMNLP2019 WS)
- What does BERT learn about the structure of language? (ACL2019)
- Open Sesame: Getting Inside BERT's Linguistic Knowledge (ACL2019 WS)
- Analyzing the Structure of Attention in a Transformer Language Model (ACL2019 WS)
- What Does BERT Look At? An Analysis of BERT's Attention (ACL2019 WS)
- Blackbox meets blackbox: Representational Similarity and Stability Analysis of Neural Language Models and Brains (ACL2019 WS)
- Inducing Syntactic Trees from BERT Representations (ACL2019 WS)
- A Multiscale Visualization of Attention in the Transformer Model (ACL2019 Demo)
- Visualizing and Measuring the Geometry of BERT
- How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings (EMNLP2019)
- Are Sixteen Heads Really Better than One? (NeurIPS2019)
- On the Validity of Self-Attention as Explanation in Transformer Models
- Visualizing and Understanding the Effectiveness of BERT (EMNLP2019)
- Attention Interpretability Across NLP Tasks
- Revealing the Dark Secrets of BERT (EMNLP2019)
- Investigating BERT's Knowledge of Language: Five Analysis Methods with NPIs (EMNLP2019)
- The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives (EMNLP2019)
- Do NLP Models Know Numbers? Probing Numeracy in Embeddings (EMNLP2019)
- How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations (CIKM2019)
- exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models [github]
- Multilingual Constituency Parsing with Self-Attention and Pre-Training (ACL2019)
- Cross-lingual Language Model Pretraining (NeurIPS2019) [github]
- 75 Languages, 1 Model: Parsing Universal Dependencies Universally (EMNLP2019) [github]
- Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT (EMNLP2019)
- How multilingual is Multilingual BERT? (ACL2019)
- Is Multilingual BERT Fluent in Language Generation?
- BioBERT: a pre-trained biomedical language representation model for biomedical text mining
- Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets (ACL2019 WS)
- BERT-based Ranking for Biomedical Entity Normalization
- PubMedQA: A Dataset for Biomedical Research Question Answering (EMNLP2019)
- Pre-trained Language Model for Biomedical Question Answering
- ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission
- Publicly Available Clinical BERT Embeddings (NAACL2019 WS)
- SciBERT: Pretrained Contextualized Embeddings for Scientific Text [github]
- PatentBERT: Patent Classification with Fine-Tuning a pre-trained BERT Model
- VideoBERT: A Joint Model for Video and Language Representation Learning (ICCV2019)
- ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks (NeurIPS2019)
- VisualBERT: A Simple and Performant Baseline for Vision and Language
- Selfie: Self-supervised Pretraining for Image Embedding
- Contrastive Bidirectional Transformer for Temporal Representation Learning
- M-BERT: Injecting Multimodal Information in the BERT Structure
- LXMERT: Learning Cross-Modality Encoder Representations from Transformers (EMNLP2019)
- Fusion of Detected Objects in Text for Visual Question Answering (EMNLP2019)
- Unified Vision-Language Pre-Training for Image Captioning and VQA [github]
- VL-BERT: Pre-training of Generic Visual-Linguistic Representations
- Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
- UNITER: Learning UNiversal Image-TExt Representations
- Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
- Patient Knowledge Distillation for BERT Model Compression (EMNLP2019)
- Small and Practical BERT Models for Sequence Labeling (EMNLP2019)
- TinyBERT: Distilling BERT for Natural Language Understanding
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (NeurIPS2019 WS) [github]
- Extreme Language Model Compression with Optimal Subwords and Shared Projections
- Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
- Cloze-driven Pretraining of Self-attention Networks
- Learning and Evaluating General Linguistic Intelligence
- To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks (ACL2019 WS)
- BERTScore: Evaluating Text Generation with BERT
- Machine Translation Evaluation with BERT Regressor
- Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
- Is BERT Really Robust? Natural Language Attack on Text Classification and Entailment
- Glyce: Glyph-vectors for Chinese Character Representations
- Back to the Future -- Sequential Alignment of Text Representations
- Improving Cuneiform Language Identification with BERT (NAACL2019 WS)
- SMILES-BERT: Large Scale Unsupervised Pre-Training for Molecular Property Prediction (ACM-BCB2019)
- Transformers: State-of-the-art Natural Language Processing
- Evolution of transfer learning in natural language processing