- Attention Is All You Need
- GPT: Improving Language Understanding by Generative Pre-Training
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- What Does BERT Look At? An Analysis of BERT's Attention
- GPT-2: Language Models are Unsupervised Multitask Learners
- Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
- XLNet: Generalized Autoregressive Pretraining for Language Understanding
- XLM: Cross-lingual Language Model Pretraining
- RoBERTa: Robustly Optimized BERT Pretraining Approach
- DistilBERT: DistilBERT, a distilled version of BERT: smaller, faster, cheaper, and lighter
- CTRL: A Conditional Transformer Language Model for Controllable Generation
- CamemBERT: a Tasty French Language Model
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
- T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- XLM-RoBERTa: Unsupervised Cross-lingual Representation Learning at Scale
- MMBT: Supervised Multimodal Bitransformers for Classifying Images and Text
- FlauBERT: Unsupervised Language Model Pre-training for French
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
- ELECTRA: Pre-training text encoders as discriminators rather than generators
- DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation
- Reformer: The Efficient Transformer
- Longformer: The Long-Document Transformer
- GPT-3: Language Models are Few-Shot Learners
- Big Bird: Transformers for Longer Sequences
- MARGE: Pre-training via Paraphrasing
- Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
- A Survey on Contextual Embeddings
- SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient
- Maximum-Likelihood Augmented Discrete Generative Adversarial Networks
- Adversarial Ranking for Language Generation
- Adversarial Feature Matching for Text Generation
- MaskGAN: Better Text Generation via Filling in the______
- Long Text Generation via Adversarial Training with Leaked Information
- Generating Sentences from a Continuous Space
- Improved Variational Autoencoders for Text Modeling using Dilated Convolutions
- Variational Lossy Autoencoder
- Lagging Inference Networks and Posterior Collapse in Variational Autoencoders
- A Surprisingly Effective Fix for Deep Latent Variable Modeling of Text
- Spherical Latent Spaces for Stable Variational Autoencoders
- Adversarially Regularized Autoencoders
- Implicit Deep Latent Variable Models for Text Generation
- Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
- Professor Forcing: A New Algorithm for Training Recurrent Networks
- Toward Controlled Generation of Text
- Wasserstein Auto-Encoders
- InfoVAE: Balancing Learning and Inference in Variational Autoencoders
- A Neural Conversational Model
- A Knowledge-Grounded Neural Conversation Model
- Neural Approaches to Conversational AI
- Wizard of Wikipedia: Knowledge-Powered Conversational agents
- The Dialogue Dodecathlon: Open-Domain Knowledge and Image Grounded Conversational Agents
- SQuAD: 100,000+ Questions for Machine Comprehension of Text
- Reading Wikipedia to Answer Open-Domain Questions
- A Survey on Recent Advances in Named Entity Recognition from Deep Learning models
- A Survey on Deep Learning for Named Entity Recognition
- Named Entity Recognition With Parallel Recurrent Neural Networks
- Evaluating the Utility of Hand-crafted Features in Sequence Labelling
- Fast and Accurate Entity Recognition with Iterated Dilated Convolutions
- Neural Adaptation Layers for Cross-domain Named Entity Recognition
- Neural Architectures for Named Entity Recognition
- Word Translation Without Parallel Data
- Unsupervised Machine Translation Using Monolingual Corpora Only
- Semi-Supervised Classification with Graph Convolutional Networks
- Graph Convolutional Networks for Text Classification
- Adversarial Training Methods for Semi-Supervised Text Classification
- Unsupervised Data Augmentation for Consistency Training
- Uncertainty-aware Self-training for Text Classification with Few Labels
- add paper name and link
- Back-translation paper: Improving Neural Machine Translation Models with Monolingual Data
- Understanding Back-Translation at Scale
- Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations
- EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
- add paper name and link
- add paper name and link
- add paper name and link
- add paper name and link
- add paper name and link
- add paper name and link
- add paper name and link
- add paper name and link
- add paper name and link
- add paper name and link
- add paper name and link
- add paper name and link
- add paper name and link