/paper_list

Paper list of NLP, RecSys, Chatbot, and CV.

MIT LicenseMIT

Paper List

Paper list of NLP, RecSys, Chatbot, and CV.

NLP

RecSys

Chatbot

CV

NLP

标题 备注
A Neural Probabilistic Language Model NNLM
Efficient Estimation of Word Representations in Vector Space Word2vec
Distributed Representations of Words and Phrases and their Compositionality Word2vec
Neural Machine Translation by Jointly Learning to Align and Translate Attention
Attention Is All You Need Transformer
Deep contextualized word representations ELMo
Improving Language Understanding by Generative Pre-Training GPT
BERT - Pre-training of Deep Bidirectional Transformers for Language Understanding BERT
RoBERTa - A Robustly Optimized BERT Pretraining Approach RoBERTa
ALBERT - A Lite BERT for Self-supervised Learning of Language Representations ALBERT
ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators ELECTRA
ERNIE - Enhanced Representation through Knowledge Integration ERNIE(百度)
ERNIE 2.0 - A Continual Pre-training Framework for Language Understanding ERNIE 2.0
ERNIE-GEN - An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation ERNIE-GEN
ERNIE - Enhanced Language Representation with Informative Entities ERNIE(清华)
Multi-Task Deep Neural Networks for Natural Language Understanding MT-DNN
NEZHA - Neural Contextualized Representation for Chinese Language Understanding NEZHA
Pre-Training with Whole Word Masking for Chinese BERT Chinese-BERT-wwm
Revisiting Pre-Trained Models for Chinese Natural Language Processing MacBERT
SpanBERT - Improving Pre-training by Representing and Predicting Spans SpanBERT
Don’t Stop Pretraining - Adapt Language Models to Domains and Tasks continue pretraining
How to Fine-Tune BERT for Text Classification? fine-tuning tips
Train No Evil - Selective Masking for Task-Guided Pre-Training continue pretraining
Layer Normalization Layer Normalization
Batch Normalization - Accelerating Deep Network Training by Reducing Internal Covariate Shift Batch Normalization
A Frustratingly Easy Approach for Joint Entity and Relation Extraction NER & RE: Typed entity markers
A Span-Extraction Dataset for Chinese Machine Reading Comprehension CMRC 2018 dataset
A Unified MRC Framework for Named Entity Recognition NER: MRC method
BERT for Joint Intent Classification and Slot Filling Text classification & NER jointly
BERT-of-Theseus - Compressing BERT by Progressive Module Replacing Distillation: BERT-of-Theseus
CLUE - A Chinese Language Understanding Evaluation Benchmark CLUE Benchmark
CLUECorpus2020 - A Large-scale Chinese Corpus for Pre-training Language Model CLUE corpus
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks Distillation: distill BERT into BiLSTM
Distilling the Knowledge in a Neural Network Distillation: Hinton
Improving Machine Reading Comprehension with Single-choice Decision and Transfer Learning MRC: single-choice model by Tencent
Language Models are Few-Shot Learners GPT-3
Language Models are Unsupervised Multitask Learners GPT-2
Neural Architectures for Named Entity Recognition NER: BiLSTM
RACE - Large-scale ReAding Comprehension Dataset From Examinations RACE dataset
TPLinker - Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking NER & RE: TPLinker
TextBrewer - An Open-Source Knowledge Distillation Toolkit for Natural Language Processing Distillation: distillation toolkit by HFL
Two are Better than One - Joint Entity and Relation Extraction with Table-Sequence Encoders NER & RE: Two are Better than One
A Survey on Knowledge Graphs - Representation, Acquisition and Applications Review of KG
Adversarial Training for Large Neural Language Models ALUM
Augmented SBERT - Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks Augmented SBERT
BART - Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension BART
Bag of Tricks for Efficient Text Classification fastText
CTRL - A Conditional Transformer Language Model for Controllable Generation CTRL
Channel Pruning for Accelerating Very Deep Neural Networks Pruning
Chinese NER Using Lattice LSTM Lattice LSTM
Compressing Deep Convolutional Networks using Vector Quantization Quantization
Conditional Random Fields - Probabilistic Models for Segmenting and Labeling Sequence Data CRF
Cross-lingual Language Model Pretraining XLM
DeBERTa - Decoding-enhanced BERT with Disentangled Attention DeBERTa
DeFormer - Decomposing Pre-trained Transformers for Faster Question Answering DeFormer
Deep Compression - Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding Quantization
DistilBERT - a distilled version of BERT- smaller, faster, cheaper and lighter DistilBERT
Do Deep Nets Really Need to be Deep Model Compression
Do Transformer Modifications Transfer Across Implementations and Applications? valuate transformer modifications
Dropout - a simple way to prevent neural networks from overfitting Dropout
DynaBERT - Dynamic BERT with Adaptive Width and Depth DynaBERT
Efficient Transformers - A Survey Review of transformers
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Evaluate GRU
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer T5
FLAT - Chinese NER Using Flat-Lattice Transformer FLAT
FastBERT - a Self-distilling BERT with Adaptive Inference Time FastBERT
Finetuning Pretrained Transformers into RNNs T2R
FitNets - Hints for Thin Deep Nets FitNets
GPT Understands, Too P-tuning
GloVe - Global Vectors for Word Representation GloVe
Informer - Beyond Efficient Transformer for Long Sequence Time-Series Forecasting Informer
K-BERT - Enabling Language Representation with Knowledge Graph K-BERT
Knowledge Distillation - A Survey Review of KD
Knowledge Distillation via Route Constrained Optimization RCO
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks Pre-trained Checkpoints for NLG
Lex-BERT - Enhancing BERT based NER with lexicons Lex-BERT
Longformer - The Long-Document Transformer Longformer
Megatron-LM - Training Multi-Billion Parameter Language Models Using Model Parallelism Megatron-LM
MiniLM - Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers MiniLM
Mixed Precision Training Mixed Precision Training
MobileBERT - a Compact Task-Agnostic BERT for Resource-Limited Devices MobileBERT
Model compression Earliest paper on KD
Neural Turing Machines NTM
On the Sentence Embeddings from Pre-trained Language Models BERT-flow
Optimal Subarchitecture Extraction For BERT Bort
PRADO - Projection Attention Networks for Document Classification On-Device PRADO
Patient Knowledge Distillation for BERT Model Compression BERT-PKD
Pre-trained Models for Natural Language Processing - A Survey Review of pretrained models
Reformer - The Efficient Transformer Reformer
Self-Attention with Relative Position Representations relative position self-attention
Sentence-BERT - Sentence Embeddings using Siamese BERT-Networks SBERT
StructBERT - Incorporating Language Structures into Pre-training for Deep Language Understanding StructBERT
Switch Transformers - Scaling to Trillion Parameter Models with Simple and Efficient Sparsity Switch Transformers
TENER - Adapting Transformer Encoder for Named Entity Recognition TENER
TinyBERT - Distilling BERT for Natural Language Understanding TinyBERT
Transformer-XL - Attentive Language Models Beyond a Fixed-Length Context Transformer-XL
Unified Language Model Pre-training for Natural Language Understanding and Generation UniLM
Well-Read Students Learn Better - On the Importance of Pre-training Compact Models Pre-trained Distillation
XLNet - Generalized Autoregressive Pretraining for Language Understanding XLNet
ZeRO-Offload - Democratizing Billion-Scale Model Training ZeRO-Offload
word2vec Explained - deriving Mikolov et al.'s negative-sampling word-embedding method Explain word2vec
word2vec Parameter Learning Explained Explain word2vec
MASS - Masked Sequence to Sequence Pre-training for Language Generation MASS
Semi-supervised Sequence Learning Pretraining and finetuning LSTM
Universal Language Model Fine-tuning for Text Classification ULMFiT
Whitening Sentence Representations for Better Semantics and Faster Retrieval BERT-whitening
A Joint Neural Model for Information Extraction with Global Features
A Novel Cascade Binary Tagging Framework for Relational Triple Extraction
A Self-Training Approach for Short Text Clustering
A Simple Framework for Contrastive Learning of Visual Representations
A Survey of Deep Learning Methods for Relation Extraction
A Survey on Contextual Embeddings
A Survey on Deep Learning for Named Entity Recognition
A Survey on Recent Advances in Named Entity Recognition from Deep Learning models
A Survey on Text Classification - From Shallow to Deep Learning
An overview of gradient descent optimization algorithms
CNN-Based Chinese NER with Lexicon Rethinking
Complex Relation Extraction - Challenges and Opportunities
ConSERT - A Contrastive Framework for Self-Supervised Sentence Representation Transfer 对比学习:ConSERT
Convolutional Neural Networks for Sentence Classification
Decoupled Weight Decay Regularization
Deep Learning Based Text Classification - A Comprehensive Review
ERNIE-GEN - An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation
End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures
Enhancement of Short Text Clustering by Iterative Classification
Enriching Word Vectors with Subword Information
Extract then Distill - Efficient and Effective Task-Agnostic BERT Distillation
FastText.zip - Compressing text classification models
Generating Long Sequences with Sparse Transformers
Hierarchical Multi-Label Classification Networks
Hierarchically-Refined Label Attention Network for Sequence Labeling
I-BERT - Integer-only BERT Quantization I-BERT
Incremental Joint Extraction of Entity Mentions and Relations
Joint Entity and Relation Extraction with Set Prediction Networks
Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme
Knowledge Graphs
Large Batch Optimization for Deep Learning - Training BERT in 76 minutes
Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter
More Data, More Relations, More Context and More Openness - A Review and Outlook for Relation Extraction
Poor Man's BERT - Smaller and Faster Transformer Models
Pre-training with Meta Learning for Chinese Word Segmentation
Q8BERT - Quantized 8Bit BERT
Recent Advances and Challenges in Task-oriented Dialog System
RethinkCWS - Is Chinese Word Segmentation a Solved Task?
Self-Taught Convolutional Neural Networks for Short Text Clustering
SimCSE - Simple Contrastive Learning of Sentence Embeddings 对比学习:SimCSE
Simplify the Usage of Lexicon in Chinese NER
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
Supporting Clustering with Contrastive Learning
Transformers are RNNs - Fast Autoregressive Transformers with Linear Attention
Universal Sentence Encoder
ZEN - Pre-training Chinese Text Encoder Enhanced by N-gram Representations
fastHan - A BERT-based Joint Many-Task Toolkit for Chinese NLP 中文NLP工具包:fastHan
ERNIE-Gram - Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding 预训练模型:ERNIE-Gram
MPNet - Masked and Permuted Pre-training for Language Understanding 预训练模型:MPNet
A Survey of Event Extraction From Text 事件抽取综述
A Survey of Transformers Transformer综述
Applying Deep Learning to Answer Selection - A Study and An Open Task 文本匹配:SiamCNN
Big Bird - Transformers for Longer Sequences 长文本处理:Big Bird
CLEVE - Contrastive Pre-training for Event Extraction 事件抽取:CLEVE
ERICA - Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning 关系抽取:ERICA
ERNIE 3.0 - Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation 预训练模型:ERNIE 3.0
ERNIE-Doc - A Retrospective Long-Document Modeling Transformer 预训练模型:ERNIE-Doc
Enhanced LSTM for Natural Language Inference 文本匹配:ESIM (Enhanced Sequential Inference Model)
Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks 事件抽取:CNN
Graph Neural Networks for Natural Language Processing - A Survey GNN结合NLP的综述
Learning Deep Structured Semantic Models for Web Search using Clickthrough Data 文本匹配:DSSM
Linformer - Self-Attention with Linear Complexity 长文本处理:Linformer
M6 - A Chinese Multimodal Pretrainer 多模态预训练模型:M6
Multi-passage BERT - A Globally Normalized BERT Model for Open-domain Question Answering 问答系统:Multi-passage BERT
PanGu-α - Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation 大规模预训练模型:PanGu-α
RoFormer - Enhanced Transformer with Rotary Position Embedding 长文本处理:RoFormer
Unsupervised Deep Embedding for Clustering Analysis 文本聚类:基于embedding的方法

RecSys

标题 备注
Using Collaborative Filtering to Weave an Information Tapestry CF
Amazon.com Recommendations Item-to-Item Collaborative Filtering ItemCF
Matrix Factorization Techniques for Recommender Systems MF
Factorization Machines FM
Field-aware Factorization Machines for CTR Prediction FFM
Practical Lessons from Predicting Clicks on Ads at Facebook GBDT+LR
Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction LS-PLM
AutoRec: Autoencoders Meet Collaborative Filtering AutoRec
Deep Neural Networks for YouTube Recommendations YoutubeDNN
A Contextual-Bandit Approach to Personalized News Article Recommendation contextual bandit
A survey of active learning in collaborative filtering recommender systems active learning
Ad click prediction - a view from the trenches FTRL
An empirical evaluation of thompson sampling thompson sampling
Artwork personalization at netflix Artwork Personalization
Attentional Factorization Machines - Learning the Weight of Feature Interactions via Attention Networks AFM
Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba EGES
CAN - Revisiting Feature Co-Action for Click-Through Rate Prediction CAN
Computation of the Singular Value Decomposition Singular Value Decomposition
DRN - A Deep Reinforcement Learning Framework for News Recommendation DRN
Deep & Cross Network for Ad Click Predictions DCN
Deep Crossing - Web-Scale Modeling without Manually Crafted Combinatorial Features Deep Crossing
Deep Interest Evolution Network for Click-Through Rate Prediction DIEN
Deep Interest Network for Click-Through Rate Prediction DIN
Deep Learning over Multi-field Categorical Data - A Case Study on User Response Prediction FNN
Deep Retrieval - An End-to-End Learnable Structure Model for Large-Scale Recommendations DR
Deep Session Interest Network for Click-Through Rate Prediction DSIN
DeepFM - A Factorization-Machine based Neural Network for CTR Prediction DeepFM
DeepWalk - Online Learning of Social Representations DeepWalk
Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization FOBOS
Entire Space Multi-Task Model - An Effective Approach for Estimating Post-Click Conversion Rate ESMM
Finite-time Analysis of the Multiarmed Bandit Problem UCB
Item2Vec - Neural Item Embedding for Collaborative Filtering Item2Vec
LINE - Large-scale Information Network Embedding LINE
Locality-Sensitive Hashing for Finding Nearest Neighbors LSH
Neural Collaborative Filtering NCF
Neural Factorization Machines for Sparse Predictive Analytics NFM
Overlapping Experiment Infrastructure - More, Better, Faster Experimentation A/B test
Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction MIMN
Product-based Neural Networks for User Response Prediction PNN
Search-based User Interest Modeling with Lifelong Sequential Behavior Data for Click-Through Rate Prediction SIM
Structural Deep Network Embedding SDNE
Wide & Deep Learning for Recommender Systems Wide & Deep
node2vec - Scalable Feature Learning for Networks node2vec
Embedding-based Retrieval in Facebook Search
Ensembled CTR Prediction via Knowledge Distillation
Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts
Privileged Features Distillation at Taobao Recommendations
Ranking Distillation - Learning Compact Ranking Models With High Performance for Recommender System
Rocket Launching - A Universal and Efficient Framework for Training Well-performing Light Net
XGBoost - A Scalable Tree Boosting System XGBoost

Chatbot

标题 备注
Rasa: Open Source Language Understanding and Dialogue Management Rasa
DIET - Lightweight Language Understanding for Dialogue Systems DIET (Dual Intent and Entity Transformer)
Dialogue Transformers TED (Transformer Embedding Dialogue)
Evaluating Natural Language Understanding Services for Conversational Question Answering Systems Evaluating NLU systems
Few-Shot Generalization Across Dialogue Tasks REDP (Recurrent Embedding Dialogue Policy)
Going Beyond T-SNE. Exposing whatlies in Text Embeddings whatlies
Where is the context? A critique of recent dialogue datasets critique of recent dialogue datasets
Demonstration of interactive teaching for end-to-end dialog control with hybrid code networks HCN interactive
DialoGPT - Large-Scale Generative Pre-training for Conversational Response Generation DialoGPT
Hybrid Code Networks - practical and efficient end-to-end dialog control with supervised and reinforcement learning HCN
OpenDial - A Toolkit for Developing Spoken Dialogue Systems with Probabilistic Rules OpenDial
PLATO - Pre-trained Dialogue Generation Model with Discrete Latent Variable PLATO
PLATO-2 - Towards Building an Open-Domain Chatbot via Curriculum Learning PLATO-2
Recipes for building an open-domain chatbot Blender
Towards a Human-like Open-Domain Chatbot Meena
A Multichannel Convolutional Neural Network For Cross-language Dialog State Tracking DSTC5
A Survey on Dialogue Systems - Recent Advances and New Frontiers Survey
CrossWOZ - A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset CrossWOZ
Memory-augmented Dialogue Management for Task-oriented Dialogue Systems MAD
BERT-DST - Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer BERT-DST
SUMBT - Slot-Utterance Matching for Universal and Scalable Belief Tracking SUMBT

CV

标题 备注
ImageNet Classification with Deep Convolutional Neural Networks AlexNet
A Simple Framework for Contrastive Learning of Visual Representations SimCLR
A Survey on Visual Transformer Review
An Image is Worth 16x16 Words - Transformers for Image Recognition at Scale ViT
CvT - Introducing Convolutions to Vision Transformers CvT
Deep Residual Learning for Image Recognition ResNet
End-to-End Object Detection with Transformers DETR
Image Transformer Image Transformer
Is Space-Time Attention All You Need for Video Understanding? TimeSformer
Learning Transferable Visual Models From Natural Language Supervision CLIP
Momentum Contrast for Unsupervised Visual Representation Learning MoCo
Pre-Trained Image Processing Transformer IPT
Pyramid Vision Transformer - A Versatile Backbone for Dense Prediction without Convolutions PVT
Swin Transformer - Hierarchical Vision Transformer using Shifted Windows Swin Transformer
TransGAN - Two Transformers Can Make One Strong GAN TransGAN
Transformer in Transformer TNT
ViViT - A Video Vision Transformer ViViT
Zero-Shot Text-to-Image Generation DALL-E
Beyond Self-attention - External Attention using Two Linear Layers for Visual Tasks
Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet
MLP-Mixer - An all-MLP Architecture for Vision MLP-Mixer
RepMLP - Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition RepMLP
ResMLP - Feedforward networks for image classification with data-efficient training ResMLP