This pandect (πανδέκτης is Ancient Greek for encyclopedia) was created to help you find almost anything related to Natural Language Processing that is available online.
- Awesome NLP by keon [GitHub ~10k stars]
- Speech and Natural Language Processing Awesome List by elaboshira [GitHub ~2k stars]
- Awesome Deep Learning for Natural Language Processing (NLP) [GitHub ~1k stars]
- Text Mining and Natural Language Processing Resources by stepthom [GitHub ~300 stars]
- Made with ML List by madewithml.com
- Brainsources for #NLP enthusiasts by Philip Vollet
- Awesome AI/ML/DL - NLP Section [GitHub ~600 stars]
- NLP top 10 conferences Compendium by soulbliss [GitHub ~300 stars]
- NLP Paper Summaries by dair-ai [GitHub ~1k stars]
- Curated collection of papers for the NLP practitioner [GitHub ~1k stars]
- Papers on Textual Adversarial Attack and Defense [GitHub ~500 stars]
- NLP Conferences Calendar
- ICLR 2020 Trends
- The Most Influential NLP Research of 2019
- Recent Deep Learning papers in NLU and RL by Valentin Malykh [GitHub ~300 stars]
- NLP Progress by sebastianruder [GitHub ~16k stars]
- NLP Tasks by Kyubyong [GitHub ~3k stars]
- Reading list for Awesome Sentiment Analysis papers by declare-lab [GitHub ~100 stars]
- Awesome Sentiment Analysis by xiamx [GitHub ~800 stars]
- NLP Datasets by niderhoff [GitHub ~4k stars]
- Big Bad NLP Database
- 25 Best Parallel Text Datasets for Machine Translation Training
- UWA Unambiguous Word Annotations - Word Sense Disambiguation Dataset
- 20 Best German Language Datasets for Machine Learning
- Awesome Embedding Models by Hironsan [GitHub ~1.3k stars]
- Awesome list of Sentence Embeddings by Separius [GitHub ~1.5k stars]
- Awesome BERT by Jiakui [GitHub ~1.5k stars]
- The Super Duper NLP Repo [Website, 2020]
- NLP Resources for Bahasa Indonesian [GitHub ~100 stars]
- Pre-trained language models for Vietnamese [GitHub ~200 stars]
- List of pre-trained NLP models [GitHub ~100 stars]
- NLP Highlights [Years: 2017 - now, Status: active]
- TWIML AI [Years: 2016 - now, Status: active]
- Data Hack Radio [Years: 2018 - now, Status: active]
- The Super Data Science Podcast [Years: 2016 - now, Status: active]
- AI Game Changers [Years: 2020 - now, Status: active]
- NLP News by Sebastian Ruder
- dair.ai Newsletter by dair.ai
- This Week in NLP by Robert Dale
- Papers with Code
- The Batch by deeplearning.ai
- Paper Digest by PaperDigest
- NLP Cypher by QuantumStat
- Yannic Kilcher
- HuggingFace
- Kaggle Reading Group
- Rasa Paper Reading
- Stanford CS224N: NLP with Deep Learning
- NLPxing
- ML Explained - A.I. Socratic Circles - AISC
- Deeplearning.ai
- Machine Learning Street Talk
- SQuAD - Stanford Question Answering Dataset (SQuAD)
- GLUE - General Language Understanding Evaluation (GLUE) benchmark
- SuperGLUE - benchmark styled after GLUE with a new set of more difficult language understanding tasks
- XTREME - Massively Multilingual Multi-task Benchmark
- decaNLP - The Natural Language Decathlon (decaNLP) for studying general NLP models
- RACE - ReAding Comprehension dataset collected from English Examinations
- A Recipe for Training Neural Networks by Andrej Karpathy [Keywords: research, training, 2019]
- Pre-trained ELMo Representations for Many Languages [GitHub ~1k stars]
- sense2vec - Contextually-keyed word vectors [GitHub ~1k stars]
- wikipedia2vec [GitHub ~500 stars]
- StarSpace [GitHub ~3k stars]
- fastText [GitHub ~21k stars]
- Language Models and Contextualised Word Embeddings by David S. Batista [Blog, 2018]
- An Essential Guide to Pretrained Word Embeddings for NLP Practitioners by AnalyticsVidhya [Blog, 2020]
- Polyglot Word Embeddings Discover Language Clusters [Blog, 2020]
- The Illustrated Word2vec by Jay Alammar [Blog, 2019]
- bpemb - Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) [GitHub ~800 stars]
- subword-nmt - Unsupervised Word Segmentation for Neural Machine Translation and Text Generation [GitHub ~1500 stars]
- python-bpe - Byte Pair Encoding for Python [GitHub ~100stars]
- The Transformer Family by Lilian Weng [Blog, 2020]
- Keeping up with the BERTs: a review of the main NLP benchmarks by Manuel Tonneau [Blog, 2020]
- Playing the lottery with rewards and multiple languages - about the effect of random initialization [ICLR 2020 Paper]
- Attention? Attention! by Lilian Weng [Blog, 2018]
- the transformer … “explained”? [Blog, 2019]
- Attention is all you need; Attentional Neural Network Models by Łukasz Kaiser [Talk, 2017]
- Understanding and Applying Self-Attention for NLP [Talk, 2018]
- The Annotated Transformer by Harvard NLP [Blog, 2018]
- The Illustrated Transformer by Jay Alammar [Blog, 2018]
- Illustrated Guide to Transformers by Hong Jing [Blog, 2020]
- Sequential Transformer with Adaptive Attention Span by Facebook. Blog [Blog, 2019]
- Evolution of Representations in the Transformer by Lena Voita [Blog, 2019]
- Reformer: The Efficient Transformer [Blog, 2020]
- Longformer — The Long-Document Transformer by Viktor Karlsson [Blog, 2020]
- TRANSFORMERS FROM SCRATCH [Blog, 2019]
- Universal Transformers by Mostafa Dehghani [Blog, 2019]
- Transformers in Natural Language Processing — A Brief Survey by George Ho [Blog, May 2020]
- Lite Transformer - Lite Transformer with Long-Short Range Attention [GitHub ~300 stars]
- A Visual Guide to Using BERT for the First Time by Jay Alammar [Blog, 2019]
- The Dark Secrets of BERT by Anna Rogers [Blog, 2020]
- Understanding searches better than ever before [Blog, 2019]
- Demystifying BERT: A Comprehensive Guide to the Groundbreaking NLP Framework [Blog, 2019]
- SemBERT - Semantics-aware BERT for Language Understanding [Github ~100 stars]
- BERTweet - BERTweet: A pre-trained language model for English Tweets [GitHub ~200 stars]
- T5 Understanding Transformer-Based Self-Supervised Architectures [Blog, August 2020]
- T5: the Text-To-Text Transfer Transformer [Blog, 2020]
- The Illustrated GPT-2 by Jay Alammar [Blog, 2019]
- The Annotated GPT-2 by Aman Arora
- OpenAI’s GPT-2: the model, the hype, and the controversy by Ryan Lowe [Blog, 2019]
- How to generate text by Patrick von Platen [Blog, 2020]
- Aweseome GPT-3 - list of all resources related to GPT-3 [GitHub ~1.5K stars]
- Zero Shot Learning for Text Classification by Amit Chaudhary [Blog, 2020]
- GPT-3 A Brief Summary by Leo Gao [Blog, 2020]
- GPT-3, a Giant Step for Deep Learning And NLP by Yoel Zeldes [Blog, June 2020]
- GPT-3 Language Model: A Technical Overview by Chuan Li [Blog, June 2020]
- OpenAI API - API Demo to use GPT-3 for commercial applications
- Big Bird: Transformers for Longer Sequences original paper by Google Research [Paper, July 2020]
- What is Two-Stream Self-Attention in XLNet by Xu LIANG [Blog, 2019]
- Visual Paper Summary: ALBERT (A Lite BERT) by Amit Chaudhary [Blog, 2020]
- Turing NLG by Microsoft
- Multi-Label Text Classification with XLNet by Josh Xin Jie Lee [Blog, 2019]
- ELECTRA [GitHub ~1k stars]
- Distilling knowledge from Neural Networks to build smaller and faster models by FloydHub [Blog, 2019]
- David over Goliath: towards smaller models for cheaper, faster, and greener NLP by Manuel Tonneau [Blog, 2020]
- PEGASUS: A State-of-the-Art Model for Abstractive Text Summarization by Google AI [Blog, June 2020]
- Why BERT Fails in Commercial Environments by Intel AI [Blog, 2020]
- Fine Tuning BERT for Text Classification with FARM by Sebastian Guggisberg [Blog, 2020]
- Practical NLP for the Real World [Presentation, 2019]
- From Paper to Product – How we implemented BERT by Christoph Henkelmann [Talk, 2020]
- embedding-as-service [GitHub, ~100 stars]
- Bert-as-service [GitHub, ~8k stars]
- NLP Recipes by microsoft [GitHub ~5k stars]
- NLP with Python by susanli2016 [GitHub ~1.5k stars]
- Basic Utilities for PyTorch NLP by PetrochukM [GitHub ~2k stars]
- Blackstone - A spaCy pipeline and model for NLP on unstructured legal text [GitHub ~300 stars]
- Sci spaCy - spaCy pipeline and models for scientific/biomedical documents [GitHub ~600 stars]
- FinBERT: Pre-Trained on SEC Filings for Financial NLP Tasks [GitHub ~100 stars]
- LexNLP - Information retrieval and extraction for real, unstructured legal text [GitHub ~400 stars]
- wav2letter - Automatic Speech Recognition Toolkit [GitHub ~5k stars]
- DeepSpeech - Baidu's DeepSpeech architecture [GitHub ~14k stars]
- Acoustic Word Embeddings by Maria Obedkova [Blog, 2020]
- kaldi - Kaldi is a toolkit for speech recognition [GitHub ~9k stars]
- awesome-kaldi - resources for using Kaldi [GitHub ~300 stars]
- FastSpeech - The Implementation of FastSpeech based on pytorch [GitHub ~500 stars]
- Topic Modelling with PySpark and Spark NLP by Maria Obedkova [Spark, Blog, 2020]
- Anchored Correlation Explanation Topic Modeling [GitHub ~300 stars]
- Topic Modeling in Embedding Spaces [GitHub ~200 stars] Paper
- TopicNet - A high-level interface for BigARTM library [GitHub ~100 stars]
- spaCy by Explosion AI [GitHub ~17k stars]
- flair by Zalando [Github ~9k stars]
- AllenNLP by AI2 [Github ~9k stars]
- stanza (former Stanford NLP) [GitHub ~4k stars]
- spaCy stanza [GitHub ~400 stars]
- nltk [GitHub ~9k stars]
- gensim - framework for topic modeling [GitHub ~11k stars]
- NLP Architect - A Deep Learning NLP/NLU library by Intel® AI Lab [GitHub ~2.5k stars]
- polyglot - Multi-lingual NLP Framework [Github ~2k stars]
- FARM [GitHub ~1k stars]
- gobbli by RTI International [GitHub ~200 stars]
- headliner - training and deployment of seq2seq models [GitHub ~200 stars]
- SyferText - A privacy preserving NLP framework [GitHub ~100 stars]
- DeText - Text Understanding Framework for Ranking and Classification Tasks [GitHub ~600 stars]
- TextHero - Text preprocessing, representation and visualization [GitHub ~2k stars]
- textblob - TextBlob: Simplified Text Processing [GitHub ~7k stars]
- AdaptNLP - A high level framework and library for NLP [GitHub ~200 stars]
- TextAttack - framework for adversarial attacks, data augmentation, and model training in NLP [GitHub ~800 stars]
- textblob-de - TextBlob: Simplified Text Processing for German [GitHub ~100 stars]
- Kashgari Transfer Learning with focus on Chinese [GitHub ~2k stars]
- Underthesea - Vietnamese NLP Toolkit [GitHub ~800 stars]
- transformers by HuggingFace [GitHub ~28k stars]
- Adapter Hub and its documentation - Adapter modules for Transformers [GitHub ~150 stars]
- DeepPavlov by MIPT [Github ~4k stars]
- ParlAI by FAIR [Github ~6k stars]
- rasa - Framework for Conversational Agents [GitHub ~9k stars]
- wav2letter - Automatic Speech Recognition Toolkit [GitHub ~5k stars]
- Spark NLP [Github ~1k stars]
- NeuralCoref 4.0: Coreference Resolution in spaCy with Neural Networks by HuggingFace [GitHub ~2k stars]
- tokenizers - Fast State-of-the-Art Tokenizers optimized for Research and Production [GitHub ~3k stars]
- SentencePiece - Unsupervised text tokenizer for Neural Network-based text generation [GitHub ~4k stars]
- SoMaJo - A tokenizer and sentence splitter for German and English web and social media texts [GitHub ~100 stars]
- A Visual Survey of Data Augmentation in NLP [Blog, 2020]
- Data augmentation for NLP [GitHub ~1k stars]
- snorkel Framework to generate training data [GitHub ~4k stars]
- Weak Supervision: A New Programming Paradigm for Machine Learning [Blog, March 2019]
- Language Interpretability Tool (LIT) [GitHub ~150 stars]
- Computational Ethics for NLP - course resources from the Carnegie Mellon University [Lecture Notes, Spring 2020]
- Ethics in NLP - resources from ACLs Ethics in NLP track
- Dive into Deep Learning - An interactive deep learning book with code, math, and discussions
- Natural Language Processing and Computational Linguistics - Speech, Morphology and Syntax (Cognitive Science)
- Choosing the right course for a Practical NLP Engineer
- 12 Best Natural Language Processing Courses & Tutorials to Learn Online
- nlp-tutorial - A list of NLP(Natural Language Processing) tutorials built on PyTorch [GitHub ~1000 stars]
- Hands-On NLTK Tutorial [GitHub ~300 stars]
- r/LanguageTechnology - NLP Reddit forum
License CC0
- All linked resources belong to original authors
- Akropolis by parkjisun from the Noun Project
- Book of Ester by Gilad Sotil from the Noun Project
- quill by Juan Pablo Bravo from the Noun Project
- acting by Flatart from the Noun Project
- olympic by supalerk laipawat from the Noun Project
- aristocracy by Eucalyp from the Noun Project
- Horn by Eucalyp from the Noun Project
- temple by Eucalyp from the Noun Project
- constellation by Eucalyp from the Noun Project
- ancient greek round pattern by Olena Panasovska from the Noun Project
- Harp by Vectors Point from the Noun Project
- Atlas by parkjisun from the Noun Project
- Parthenon by Eucalyp from the Noun Project