Paper List

Paper list of NLP, RecSys, Chatbot, and CV.

NLP

标题	备注
A Neural Probabilistic Language Model	NNLM
Efficient Estimation of Word Representations in Vector Space	Word2vec
Distributed Representations of Words and Phrases and their Compositionality	Word2vec
Neural Machine Translation by Jointly Learning to Align and Translate	Attention
Attention Is All You Need	Transformer
Deep contextualized word representations	ELMo
Improving Language Understanding by Generative Pre-Training	GPT
BERT - Pre-training of Deep Bidirectional Transformers for Language Understanding	BERT
RoBERTa - A Robustly Optimized BERT Pretraining Approach	RoBERTa
ALBERT - A Lite BERT for Self-supervised Learning of Language Representations	ALBERT
ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators	ELECTRA
ERNIE - Enhanced Representation through Knowledge Integration	ERNIE(百度)
ERNIE 2.0 - A Continual Pre-training Framework for Language Understanding	ERNIE 2.0
ERNIE-GEN - An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation	ERNIE-GEN
ERNIE - Enhanced Language Representation with Informative Entities	ERNIE(清华)
Multi-Task Deep Neural Networks for Natural Language Understanding	MT-DNN
NEZHA - Neural Contextualized Representation for Chinese Language Understanding	NEZHA
Pre-Training with Whole Word Masking for Chinese BERT	Chinese-BERT-wwm
Revisiting Pre-Trained Models for Chinese Natural Language Processing	MacBERT
SpanBERT - Improving Pre-training by Representing and Predicting Spans	SpanBERT
Don’t Stop Pretraining - Adapt Language Models to Domains and Tasks	continue pretraining
How to Fine-Tune BERT for Text Classification?	fine-tuning tips
Train No Evil - Selective Masking for Task-Guided Pre-Training	continue pretraining
Layer Normalization	Layer Normalization
Batch Normalization - Accelerating Deep Network Training by Reducing Internal Covariate Shift	Batch Normalization
A Frustratingly Easy Approach for Joint Entity and Relation Extraction	NER & RE: Typed entity markers
A Span-Extraction Dataset for Chinese Machine Reading Comprehension	CMRC 2018 dataset
A Unified MRC Framework for Named Entity Recognition	NER: MRC method
BERT for Joint Intent Classification and Slot Filling	Text classification & NER jointly
BERT-of-Theseus - Compressing BERT by Progressive Module Replacing	Distillation: BERT-of-Theseus
CLUE - A Chinese Language Understanding Evaluation Benchmark	CLUE Benchmark
CLUECorpus2020 - A Large-scale Chinese Corpus for Pre-training Language Model	CLUE corpus
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks	Distillation: distill BERT into BiLSTM
Distilling the Knowledge in a Neural Network	Distillation: Hinton
Improving Machine Reading Comprehension with Single-choice Decision and Transfer Learning	MRC: single-choice model by Tencent
Language Models are Few-Shot Learners	GPT-3
Language Models are Unsupervised Multitask Learners	GPT-2
Neural Architectures for Named Entity Recognition	NER: BiLSTM
RACE - Large-scale ReAding Comprehension Dataset From Examinations	RACE dataset
TPLinker - Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking	NER & RE: TPLinker
TextBrewer - An Open-Source Knowledge Distillation Toolkit for Natural Language Processing	Distillation: distillation toolkit by HFL
Two are Better than One - Joint Entity and Relation Extraction with Table-Sequence Encoders	NER & RE: Two are Better than One
A Survey on Knowledge Graphs - Representation, Acquisition and Applications	Review of KG
Adversarial Training for Large Neural Language Models	ALUM
Augmented SBERT - Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks	Augmented SBERT
BART - Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension	BART
Bag of Tricks for Efficient Text Classification	fastText
CTRL - A Conditional Transformer Language Model for Controllable Generation	CTRL
Channel Pruning for Accelerating Very Deep Neural Networks	Pruning
Chinese NER Using Lattice LSTM	Lattice LSTM
Compressing Deep Convolutional Networks using Vector Quantization	Quantization
Conditional Random Fields - Probabilistic Models for Segmenting and Labeling Sequence Data	CRF
Cross-lingual Language Model Pretraining	XLM
DeBERTa - Decoding-enhanced BERT with Disentangled Attention	DeBERTa
DeFormer - Decomposing Pre-trained Transformers for Faster Question Answering	DeFormer
Deep Compression - Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding	Quantization
DistilBERT - a distilled version of BERT- smaller, faster, cheaper and lighter	DistilBERT
Do Deep Nets Really Need to be Deep	Model Compression
Do Transformer Modifications Transfer Across Implementations and Applications?	valuate transformer modifications
Dropout - a simple way to prevent neural networks from overfitting	Dropout
DynaBERT - Dynamic BERT with Adaptive Width and Depth	DynaBERT
Efficient Transformers - A Survey	Review of transformers
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling	Evaluate GRU
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	T5
FLAT - Chinese NER Using Flat-Lattice Transformer	FLAT
FastBERT - a Self-distilling BERT with Adaptive Inference Time	FastBERT
Finetuning Pretrained Transformers into RNNs	T2R
FitNets - Hints for Thin Deep Nets	FitNets
GPT Understands, Too	P-tuning
GloVe - Global Vectors for Word Representation	GloVe
Informer - Beyond Efficient Transformer for Long Sequence Time-Series Forecasting	Informer
K-BERT - Enabling Language Representation with Knowledge Graph	K-BERT
Knowledge Distillation - A Survey	Review of KD
Knowledge Distillation via Route Constrained Optimization	RCO
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks	Pre-trained Checkpoints for NLG
Lex-BERT - Enhancing BERT based NER with lexicons	Lex-BERT
Longformer - The Long-Document Transformer	Longformer
Megatron-LM - Training Multi-Billion Parameter Language Models Using Model Parallelism	Megatron-LM
MiniLM - Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers	MiniLM
Mixed Precision Training	Mixed Precision Training
MobileBERT - a Compact Task-Agnostic BERT for Resource-Limited Devices	MobileBERT
Model compression	Earliest paper on KD
Neural Turing Machines	NTM
On the Sentence Embeddings from Pre-trained Language Models	BERT-flow
Optimal Subarchitecture Extraction For BERT	Bort
PRADO - Projection Attention Networks for Document Classification On-Device	PRADO
Patient Knowledge Distillation for BERT Model Compression	BERT-PKD
Pre-trained Models for Natural Language Processing - A Survey	Review of pretrained models
Reformer - The Efficient Transformer	Reformer
Self-Attention with Relative Position Representations	relative position self-attention
Sentence-BERT - Sentence Embeddings using Siamese BERT-Networks	SBERT
StructBERT - Incorporating Language Structures into Pre-training for Deep Language Understanding	StructBERT
Switch Transformers - Scaling to Trillion Parameter Models with Simple and Efficient Sparsity	Switch Transformers
TENER - Adapting Transformer Encoder for Named Entity Recognition	TENER
TinyBERT - Distilling BERT for Natural Language Understanding	TinyBERT
Transformer-XL - Attentive Language Models Beyond a Fixed-Length Context	Transformer-XL
Unified Language Model Pre-training for Natural Language Understanding and Generation	UniLM
Well-Read Students Learn Better - On the Importance of Pre-training Compact Models	Pre-trained Distillation
XLNet - Generalized Autoregressive Pretraining for Language Understanding	XLNet
ZeRO-Offload - Democratizing Billion-Scale Model Training	ZeRO-Offload
word2vec Explained - deriving Mikolov et al.'s negative-sampling word-embedding method	Explain word2vec
word2vec Parameter Learning Explained	Explain word2vec
MASS - Masked Sequence to Sequence Pre-training for Language Generation	MASS
Semi-supervised Sequence Learning	Pretraining and finetuning LSTM
Universal Language Model Fine-tuning for Text Classification	ULMFiT
Whitening Sentence Representations for Better Semantics and Faster Retrieval	BERT-whitening
A Joint Neural Model for Information Extraction with Global Features
A Novel Cascade Binary Tagging Framework for Relational Triple Extraction
A Self-Training Approach for Short Text Clustering
A Simple Framework for Contrastive Learning of Visual Representations
A Survey of Deep Learning Methods for Relation Extraction
A Survey on Contextual Embeddings
A Survey on Deep Learning for Named Entity Recognition
A Survey on Recent Advances in Named Entity Recognition from Deep Learning models
A Survey on Text Classification - From Shallow to Deep Learning
An overview of gradient descent optimization algorithms
CNN-Based Chinese NER with Lexicon Rethinking
Complex Relation Extraction - Challenges and Opportunities
ConSERT - A Contrastive Framework for Self-Supervised Sentence Representation Transfer	对比学习：ConSERT
Convolutional Neural Networks for Sentence Classification
Decoupled Weight Decay Regularization
Deep Learning Based Text Classification - A Comprehensive Review
ERNIE-GEN - An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation
End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures
Enhancement of Short Text Clustering by Iterative Classification
Enriching Word Vectors with Subword Information
Extract then Distill - Efficient and Effective Task-Agnostic BERT Distillation
FastText.zip - Compressing text classification models
Generating Long Sequences with Sparse Transformers
Hierarchical Multi-Label Classification Networks
Hierarchically-Refined Label Attention Network for Sequence Labeling
I-BERT - Integer-only BERT Quantization	I-BERT
Incremental Joint Extraction of Entity Mentions and Relations
Joint Entity and Relation Extraction with Set Prediction Networks
Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme
Knowledge Graphs
Large Batch Optimization for Deep Learning - Training BERT in 76 minutes
Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter
More Data, More Relations, More Context and More Openness - A Review and Outlook for Relation Extraction
Poor Man's BERT - Smaller and Faster Transformer Models
Pre-training with Meta Learning for Chinese Word Segmentation
Q8BERT - Quantized 8Bit BERT
Recent Advances and Challenges in Task-oriented Dialog System
RethinkCWS - Is Chinese Word Segmentation a Solved Task?
Self-Taught Convolutional Neural Networks for Short Text Clustering
SimCSE - Simple Contrastive Learning of Sentence Embeddings	对比学习：SimCSE
Simplify the Usage of Lexicon in Chinese NER
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
Supporting Clustering with Contrastive Learning
Transformers are RNNs - Fast Autoregressive Transformers with Linear Attention
Universal Sentence Encoder
ZEN - Pre-training Chinese Text Encoder Enhanced by N-gram Representations
fastHan - A BERT-based Joint Many-Task Toolkit for Chinese NLP	中文NLP工具包：fastHan
ERNIE-Gram - Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding	预训练模型：ERNIE-Gram
MPNet - Masked and Permuted Pre-training for Language Understanding	预训练模型：MPNet
A Survey of Event Extraction From Text	事件抽取综述
A Survey of Transformers	Transformer综述
Applying Deep Learning to Answer Selection - A Study and An Open Task	文本匹配：SiamCNN
Big Bird - Transformers for Longer Sequences	长文本处理：Big Bird
CLEVE - Contrastive Pre-training for Event Extraction	事件抽取：CLEVE
ERICA - Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning	关系抽取：ERICA
ERNIE 3.0 - Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation	预训练模型：ERNIE 3.0
ERNIE-Doc - A Retrospective Long-Document Modeling Transformer	预训练模型：ERNIE-Doc
Enhanced LSTM for Natural Language Inference	文本匹配：ESIM (Enhanced Sequential Inference Model)
Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks	事件抽取：CNN
Graph Neural Networks for Natural Language Processing - A Survey	GNN结合NLP的综述
Learning Deep Structured Semantic Models for Web Search using Clickthrough Data	文本匹配：DSSM
Linformer - Self-Attention with Linear Complexity	长文本处理：Linformer
M6 - A Chinese Multimodal Pretrainer	多模态预训练模型：M6
Multi-passage BERT - A Globally Normalized BERT Model for Open-domain Question Answering	问答系统：Multi-passage BERT
PanGu-α - Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation	大规模预训练模型：PanGu-α
RoFormer - Enhanced Transformer with Rotary Position Embedding	长文本处理：RoFormer
Unsupervised Deep Embedding for Clustering Analysis	文本聚类：基于embedding的方法

RecSys

标题	备注
Using Collaborative Filtering to Weave an Information Tapestry	CF
Amazon.com Recommendations Item-to-Item Collaborative Filtering	ItemCF
Matrix Factorization Techniques for Recommender Systems	MF
Factorization Machines	FM
Field-aware Factorization Machines for CTR Prediction	FFM
Practical Lessons from Predicting Clicks on Ads at Facebook	GBDT+LR
Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction	LS-PLM
AutoRec: Autoencoders Meet Collaborative Filtering	AutoRec
Deep Neural Networks for YouTube Recommendations	YoutubeDNN
A Contextual-Bandit Approach to Personalized News Article Recommendation	contextual bandit
A survey of active learning in collaborative filtering recommender systems	active learning
Ad click prediction - a view from the trenches	FTRL
An empirical evaluation of thompson sampling	thompson sampling
Artwork personalization at netflix	Artwork Personalization
Attentional Factorization Machines - Learning the Weight of Feature Interactions via Attention Networks	AFM
Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba	EGES
CAN - Revisiting Feature Co-Action for Click-Through Rate Prediction	CAN
Computation of the Singular Value Decomposition	Singular Value Decomposition
DRN - A Deep Reinforcement Learning Framework for News Recommendation	DRN
Deep & Cross Network for Ad Click Predictions	DCN
Deep Crossing - Web-Scale Modeling without Manually Crafted Combinatorial Features	Deep Crossing
Deep Interest Evolution Network for Click-Through Rate Prediction	DIEN
Deep Interest Network for Click-Through Rate Prediction	DIN
Deep Learning over Multi-field Categorical Data - A Case Study on User Response Prediction	FNN
Deep Retrieval - An End-to-End Learnable Structure Model for Large-Scale Recommendations	DR
Deep Session Interest Network for Click-Through Rate Prediction	DSIN
DeepFM - A Factorization-Machine based Neural Network for CTR Prediction	DeepFM
DeepWalk - Online Learning of Social Representations	DeepWalk
Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization	FOBOS
Entire Space Multi-Task Model - An Effective Approach for Estimating Post-Click Conversion Rate	ESMM
Finite-time Analysis of the Multiarmed Bandit Problem	UCB
Item2Vec - Neural Item Embedding for Collaborative Filtering	Item2Vec
LINE - Large-scale Information Network Embedding	LINE
Locality-Sensitive Hashing for Finding Nearest Neighbors	LSH
Neural Collaborative Filtering	NCF
Neural Factorization Machines for Sparse Predictive Analytics	NFM
Overlapping Experiment Infrastructure - More, Better, Faster Experimentation	A/B test
Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction	MIMN
Product-based Neural Networks for User Response Prediction	PNN
Search-based User Interest Modeling with Lifelong Sequential Behavior Data for Click-Through Rate Prediction	SIM
Structural Deep Network Embedding	SDNE
Wide & Deep Learning for Recommender Systems	Wide & Deep
node2vec - Scalable Feature Learning for Networks	node2vec
Embedding-based Retrieval in Facebook Search
Ensembled CTR Prediction via Knowledge Distillation
Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts
Privileged Features Distillation at Taobao Recommendations
Ranking Distillation - Learning Compact Ranking Models With High Performance for Recommender System
Rocket Launching - A Universal and Efficient Framework for Training Well-performing Light Net
XGBoost - A Scalable Tree Boosting System	XGBoost

Chatbot

标题	备注
Rasa: Open Source Language Understanding and Dialogue Management	Rasa
DIET - Lightweight Language Understanding for Dialogue Systems	DIET (Dual Intent and Entity Transformer)
Dialogue Transformers	TED (Transformer Embedding Dialogue)
Evaluating Natural Language Understanding Services for Conversational Question Answering Systems	Evaluating NLU systems
Few-Shot Generalization Across Dialogue Tasks	REDP (Recurrent Embedding Dialogue Policy)
Going Beyond T-SNE. Exposing whatlies in Text Embeddings	whatlies
Where is the context? A critique of recent dialogue datasets	critique of recent dialogue datasets
Demonstration of interactive teaching for end-to-end dialog control with hybrid code networks	HCN interactive
DialoGPT - Large-Scale Generative Pre-training for Conversational Response Generation	DialoGPT
Hybrid Code Networks - practical and efficient end-to-end dialog control with supervised and reinforcement learning	HCN
OpenDial - A Toolkit for Developing Spoken Dialogue Systems with Probabilistic Rules	OpenDial
PLATO - Pre-trained Dialogue Generation Model with Discrete Latent Variable	PLATO
PLATO-2 - Towards Building an Open-Domain Chatbot via Curriculum Learning	PLATO-2
Recipes for building an open-domain chatbot	Blender
Towards a Human-like Open-Domain Chatbot	Meena
A Multichannel Convolutional Neural Network For Cross-language Dialog State Tracking	DSTC5
A Survey on Dialogue Systems - Recent Advances and New Frontiers	Survey
CrossWOZ - A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset	CrossWOZ
Memory-augmented Dialogue Management for Task-oriented Dialogue Systems	MAD
BERT-DST - Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer	BERT-DST
SUMBT - Slot-Utterance Matching for Universal and Scalable Belief Tracking	SUMBT

CV

标题	备注
ImageNet Classification with Deep Convolutional Neural Networks	AlexNet
A Simple Framework for Contrastive Learning of Visual Representations	SimCLR
A Survey on Visual Transformer	Review
An Image is Worth 16x16 Words - Transformers for Image Recognition at Scale	ViT
CvT - Introducing Convolutions to Vision Transformers	CvT
Deep Residual Learning for Image Recognition	ResNet
End-to-End Object Detection with Transformers	DETR
Image Transformer	Image Transformer
Is Space-Time Attention All You Need for Video Understanding?	TimeSformer
Learning Transferable Visual Models From Natural Language Supervision	CLIP
Momentum Contrast for Unsupervised Visual Representation Learning	MoCo
Pre-Trained Image Processing Transformer	IPT
Pyramid Vision Transformer - A Versatile Backbone for Dense Prediction without Convolutions	PVT
Swin Transformer - Hierarchical Vision Transformer using Shifted Windows	Swin Transformer
TransGAN - Two Transformers Can Make One Strong GAN	TransGAN
Transformer in Transformer	TNT
ViViT - A Video Vision Transformer	ViViT
Zero-Shot Text-to-Image Generation	DALL-E
Beyond Self-attention - External Attention using Two Linear Layers for Visual Tasks
Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet
MLP-Mixer - An all-MLP Architecture for Vision	MLP-Mixer
RepMLP - Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition	RepMLP
ResMLP - Feedforward networks for image classification with data-efficient training	ResMLP

shenhuaze/paper_list

Paper List

NLP

RecSys

Chatbot

CV