I've been working on several natural language processing tasks for a long time. One day, I felt like to draw a map of the NLP field where I earn a living. I'm sure I'm not the only person who wants to see at a glance which tasks are in NLP.
I did my best to cover as many as possible tasks in NLP, but admittedly this is far from exhaustive purely due to my lack of knowledge. And selected references are biased towards recent deep learning accomplishments. I expect these serve as a starting point when you're about to dig into the task. I'll keep updating this repo myself, but what I really hope is you collaborate on this work. Don't hesitate to send me a pull request!
Oct. 13, 2017.
by Kyubyong
PAPER
Automatic Text Scoring Using Neural NetworksPAPER
A Neural Approach to Automated Essay ScoringCHALLENGE
Kaggle: The Hewlett Foundation: Automated Essay ScoringPROJECT
EASE (Enhanced AI Scoring Engine)
WIKI
Speech recognitionPAPER
Deep Speech 2: End-to-End Speech Recognition in English and MandarinPAPER
WaveNet: A Generative Model for Raw AudioPROJECT
A TensorFlow implementation of Baidu's DeepSpeech architecturePROJECT
Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition using DeepMind's WaveNetCHALLENGE
The 5th CHiME Speech Separation and Recognition ChallengeDATA
The 5th CHiME Speech Separation and Recognition ChallengeDATA
CSTR VCTK CorpusDATA
LibriSpeech ASR corpusDATA
Switchboard-1 Telephone Speech CorpusDATA
TED-LIUM Corpus
WIKI
Automatic summarizationBOOK
Automatic Text SummarizationPAPER
Text Summarization Using Neural NetworksPAPER
Ranking with Recursive Neural Networks and Its Application to Multi-Document SummarizationDATA
Text Analytics Conferences (TAC)DATA
Document Understanding Conferences (DUC)
INFO
Coreference ResolutionPAPER
Deep Reinforcement Learning for Mention-Ranking Coreference ModelsPAPER
Improving Coreference Resolution by Learning Entity-Level Distributed RepresentationsCHALLENGE
CoNLL 2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotesCHALLENGE
CoNLL 2011 Shared Task: Modeling Unrestricted Coreference in OntoNotes
PAPER
Neural Network Translation Models for Grammatical Error CorrectionCHALLENGE
CoNLL-2013 Shared Task: Grammatical Error CorrectionCHALLENGE
CoNLL-2014 Shared Task: Grammatical Error CorrectionDATA
NUS Non-commercial research/trial corpus licenseDATA
Lang-8 Learner CorporaDATA
Cornell Movie--Dialogs CorpusPROJECT
Deep Text CorrectorPRODUCT
deep grammar
PAPER
Grapheme-to-Phoneme Models for (Almost) Any LanguagePAPER
Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation LearningPAPER
Multitask Sequence-to-Sequence Models for Grapheme-to-Phoneme ConversionPROJECT
Sequence-to-Sequence G2P toolkitDATA
Multilingual Pronunciation Data
WIKI
Language identificationPAPER
AUTOMATIC LANGUAGE IDENTIFICATION USING DEEP NEURAL NETWORKSCHALLENGE
2015 Language Recognition Evaluation
WIKI
Language modelTOOLKIT
KenLM Language Model ToolkitPAPER
Distributed Representations of Words and Phrases and their CompositionalityPAPER
Character-Aware Neural Language ModelsDATA
Penn Treebank
WIKI
LemmatisationPAPER
Joint Lemmatization and Morphological Tagging with LEMMINGTOOLKIT
WordNet LemmatizerDATA
Treebank-3
WIKI
Lip readingPAPER
Lip Reading Sentences in the WildPAPER
3D Convolutional Neural Networks for Cross Audio-Visual Matching RecognitionPROJECT
Lip Reading - Cross Audio-Visual Recognition using 3D Convolutional Neural NetworksDATA
The GRID audiovisual sentence corpus
PAPER
Neural Machine Translation by Jointly Learning to Align and TranslatePAPER
Neural Machine Translation in Linear TimePAPER
Attention Is All You NeedCHALLENGE
ACL 2014 NINTH WORKSHOP ON STATISTICAL MACHINE TRANSLATIONCHALLENGE
EMNLP 2017 SECOND CONFERENCE ON MACHINE TRANSLATION (WMT17)DATA
OpenSubtitles2016DATA
WIT3: Web Inventory of Transcribed and Translated TalksDATA
The QCRI Educational Domain (QED) Corpus
WIKI
InflectionPAPER
Morphological Inflection Generation Using Character Sequence to Sequence LearningCHALLENGE
SIGMORPHON 2016 Shared Task: Morphological ReinflectionDATA
sigmorphon2016
WIKI
Named-entity recognitionPAPER
Neural Architectures for Named Entity RecognitionPROJECT
OSU Twitter NLP ToolsCHALLENGE
Named Entity Recognition in TwitterCHALLENGE
CoNLL 2002 Language-Independent Named Entity RecognitionCHALLENGE
Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity RecognitionDATA
CoNLL-2002 NER corpusDATA
CoNLL-2003 NER corpusDATA
NUT Named Entity Recognition in Twitter Shared task
PAPER
Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase DetectionPROJECT
Paralex: Paraphrase-Driven Learning for Open Question AnsweringDATA
Microsoft Research Paraphrase CorpusDATA
Microsoft Research Video Description CorpusDATA
Pascal DatasetDATA
Flickr DatasetDATA
The SICK data setDATA
PPDB: The Paraphrase DatabaseDATA
WikiAnswers Paraphrase Corpus
PAPER
Neural Paraphrase Generation with Stacked Residual LSTM NetworksPAPER
A Deep Generative Framework for Paraphrase GenerationPAPER
Paraphrasing Revisited with Neural Machine Translation
WIKI
ParsingTOOLKIT
The Stanford Parser: A statistical parserTOOLKIT
spaCy parserPAPER
A fast and accurate dependency parser using neural networksCHALLENGE
CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal DependenciesCHALLENGE
CoNLL 2016 Shared Task: Multilingual Shallow Discourse ParsingCHALLENGE
CoNLL 2015 Shared Task: Shallow Discourse ParsingCHALLENGE
SemEval-2016 Task 8: The meaning representations may be abstract, but this task is concrete!
WIKI
Part-of-speech taggingPAPER
Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary LossPAPER
Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov ModelsDATA
Treebank-3TOOLKIT
nltk.tag package
PAPER
Neural Network Language Model for Chinese Pinyin Input Method EnginePROJECT
Neural Chinese Transliterator
WIKI
Question answeringPAPER
Ask Me Anything: Dynamic Memory Networks for Natural Language ProcessingPAPER
Dynamic Memory Networks for Visual and Textual Question AnsweringCHALLENGE
TREC Question Answering TaskCHALLENGE
NTCIR-8: Advanced Cross-lingual Information Access (ACLIA)CHALLENGE
CLEF Question Answering TrackCHALLENGE
SemEval-2017 Task 3: Community Question AnsweringDATA
MS MARCO: Microsoft MAchine Reading COmprehension DatasetDATA
Maluuba NewsQADATA
SQuAD: 100,000+ Questions for Machine Comprehension of TextDATA
GraphQuestions: A Characteristic-rich Question Answering DatasetDATA
Story Cloze Test and ROCStories CorporaDATA
Microsoft Research WikiQA CorpusDATA
DeepMind Q&A DatasetDATA
QASent
WIKI
Relationship extractionPAPER
A deep learning approach for relationship extraction from interaction context in social manufacturing paradigm
WIKI
Semantic role labelingBOOK
Semantic Role LabelingPAPER
End-to-end Learning of Semantic Role Labeling Using Recurrent Neural NetworksPAPER
Neural Semantic Role Labeling with Dependency Path EmbeddingsPAPER
Deep Semantic Role Labeling: What Works and What's NextCHALLENGE
CoNLL-2005 Shared Task: Semantic Role LabelingCHALLENGE
CoNLL-2004 Shared Task: Semantic Role LabelingTOOLKIT
Illinois Semantic Role Labeler (SRL)DATA
CoNLL-2005 Shared Task: Semantic Role Labeling
WIKI
Sentence boundary disambiguationPAPER
A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical DomainTOOLKIT
NLTK TokenizersDATA
The British National CorpusDATA
Switchboard-1 Telephone Speech Corpus
WIKI
Sentiment analysisINFO
Awesome Sentiment AnalysisCHALLENGE
Kaggle: UMICH SI650 - Sentiment ClassificationCHALLENGE
SemEval-2017 Task 4: Sentiment Analysis in TwitterCHALLENGE
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and NewsPROJECT
SenticNetDATA
Multi-Domain Sentiment Dataset (version 2.0)DATA
Stanford Sentiment TreebankDATA
Twitter Sentiment CorpusDATA
Twitter Sentiment Analysis Training CorpusDATA
AFINN: List of English words rated for valence
WIKI
Source separationPAPER
From Blind to Guided Audio Source SeparationPAPER
Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source SeparationCHALLENGE
Signal Separation Evaluation Campaign (SiSEC)CHALLENGE
CHiME Speech Separation and Recognition Challenge
WIKI
Speaker diarisationPAPER
DNN-based speaker clustering for speaker diarisationPAPER
Unsupervised Methods for Speaker Diarization: An Integrated and Iterative ApproachPAPER
Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian FusionCHALLENGE
Rich Transcription Evaluation
WIKI
Speaker recognitionPAPER
A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORKPAPER
DEEP NEURAL NETWORKS FOR SMALL FOOTPRINT TEXT-DEPENDENT SPEAKER VERIFICATIONCHALLENGE
NIST Speaker Recognition Evaluation (SRE)INFO
Are there any suggestions for free databases for speaker recognition?
- See Lip-reading
WIKI
Speech_segmentationPAPER
Word Segmentation by 8-Month-Olds: When Speech Cues Count More Than StatisticsPAPER
Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word EmbeddingsPAPER
Unsupervised Lexicon Discovery from Acoustic InputPAPER
Weakly supervised spoken term discovery using cross-lingual side informationDATA
CALLHOME Spanish Speech
WIKI
Speech synthesisPAPER
WaveNet: A Generative Model for Raw AudioPAPER
Tacotron: Towards End-to-End Speech SynthesisPAPER
Deep Voice 2: Multi-Speaker Neural Text-to-SpeechDATA
The World English BibleDATA
LJ Speech DatasetDATA
Lessac DataCHALLENGE
Blizzard Challenge 2017PRODUCT
LyrebirdPROJECT
The Festvox projectTOOLKIT
Merlin: The Neural Network (NN) based Speech Synthesis System
WIKI
Speech enhancementBOOK
Speech enhancement: theory and practicePAPER
An Experimental Study on Speech Enhancement BasedonDeepNeuralNetworkPAPER
A Regression Approach to Speech Enhancement BasedonDeepNeuralNetworksPAPER
Speech Enhancement Based on Deep Denoising Autoencoder
WIKI
StemmingPAPER
A BACKPROPAGATION NEURAL NETWORK TO IMPROVE ARABIC STEMMINGTOOLKIT
NLTK Stemmers
WIKI
Terminology extractionPAPER
Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection
WIKI
Text simplificationPAPER
Aligning Sentences from Standard Wikipedia to Simple WikipediaPAPER
Problems in Current Text Simplification Research: New Data Can HelpDATA
Newsela Data
- See Speech Synthesis
WIKI
Textual entailmentPROJECT
Textual Entailment with TensorFlowPAPER
Textual Entailment with Structured Attentions and CompositionCHALLENGE
SemEval-2014 Task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailmentCHALLENGE
SemEval-2013 Task 7: The Joint Student Response Analysis and 8th Recognizing Textual Entailment Challenge
PAPER
PHONETIC POSTERIORGRAMS FOR MANY-TO-ONE VOICE CONVERSION WITHOUT PARALLEL DATA TRAININGPROJECT
An implementation of voice conversion system utilizing phonetic posteriorgramsCHALLENGE
Voice Conversion Challenge 2016CHALLENGE
Voice Conversion Challenge 2018DATA
CMU_ARCTIC speech synthesis databasesDATA
TIMIT Acoustic-Phonetic Continuous Speech Corpus
WIKI
Word embeddingTOOLKIT
Gensim: word2vecTOOLKIT
fastTextTOOLKIT
GloVe: Global Vectors for Word RepresentationINFO
Where to get a pretrained modelPROJECT
Pre-trained word vectors of 30+ languagesPROJECT
Polyglot: Distributed word representations for multilingual NLP
INFO
What is Word Prediction?PAPER
The prediction of character based on recurrent neural network language modelPAPER
An Embedded Deep Learning based Word PredictionPAPER
Evaluating Word Prediction: Framing Keystroke SavingsDATA
An Embedded Deep Learning based Word PredictionPROJECT
Word Prediction using Convolutional Neural Networks—can you do better than iPhone™ Keyboard?