Repository to show how NLP can tacke real problem. Including the source code, dataset, state-of-the art in NLP
| Section | Sub-Section | Research Lab | Story | Paper & Code |
|---|---|---|---|---|
| Augmentation | Data Augmentation in NLP | Medium | ||
| Augmentation | Data Augmentation library for Text | Medium | ||
| Augmentation | Does your NLP model able to prevent adversarial attack? | Medium | ||
| Augmentation | Data Augmentation library for Speech Recognition | Medium | ||
| Augmentation | Data Augmentation library for Audio | Medium | ||
| Augmentation | Unsupervied Data Augmentation | Medium |
| Section | Sub-Section | Description | Link |
|---|---|---|---|
| Tokenization | Subword Tokenization | Medium | |
| Tokenization | Word Tokenization | Medium Github | |
| Tokenization | Sentence Tokenization | Medium Github | |
| Part of Speech | Medium Github | ||
| Lemmatization | Medium Github | ||
| Stemming | Medium Github | ||
| Stop Words | Medium Github | ||
| Phrase Word Recognition | |||
| Spell Checking | Lexicon-based | Peter Norvig algorithm | Medium Github |
| Lexicon-based | Symspell | Medium Github | |
| Machine Translation | Statistical Machine Translation | Medium | |
| Machine Translation | Attention | Medium | |
| String Matching | Fuzzywuzzy | Medium Github |
| Section | Sub-Section | Research Lab | Story | Paper & Code |
|---|---|---|---|---|
| Pattern-based Recognition | Medium | |||
| Lexicon-based Recognition | Medium | |||
| Pre-trained NER | Spacy | Medium Github | ||
| Custom NER |
| Section | Sub-Section | Research Lab | Story | Paper & Code |
|---|---|---|---|---|
| Printed Text | Google Cloud Vision API | Medium | Paper | |
| Handwriting | LSTM | Medium | Paper |
| Section | Sub-Section | Description | Link |
|---|---|---|---|
| Extractive Approach | Medium Github | ||
| Abstractive Approach |
| Section | Sub-Section | Description | Link |
|---|---|---|---|
| Audio, Text, Visual | 3 Multimodals for Emotion Recognition | Medium |
| Section | Sub-Section | Description | Link |
|---|---|---|---|
| Feature Representation | Unsupervised Learning | Introduction to Audio Feature Learning | Medium Paper 1 Paper 2 Paper 3 |
| Speech-to-text | Introduction to Speeh-to-text | Medium |
| Section | Sub-Section | Description | Link | Paper |
|---|---|---|---|---|
| Euclidean Distance, Cosine Similarity and Jaccard Similarity | Medium Github | |||
| Edit Distance | Levenshtein Distance | Medium Github | ||
| Word Moving Distance (WMD) | Medium Github | |||
| Supervised Word Moving Distance (S-WMD) | Medium | |||
| Manhattan LSTM | Medium | Paper |
| Section | Sub-Section | Research Lab | Story | Paper & Code |
|---|---|---|---|---|
| Traditional Method | Bag-of-words (BoW) | Medium Github | ||
| Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) | Medium Github | |||
| Character Level | Character Embedding | New York University | Medium Github | Paper |
| Word Level | Negative Sampling and Hierarchical Softmax | Medium | ||
| Word2Vec, GloVe, fastText | Medium Github | |||
| Contextualized Word Vectors (CoVe) | Salesforce | Medium Github | Paper Code | |
| Misspelling Oblivious (word) Embeddings | Medium | Paper | ||
| Embeddings from Language Models (ELMo) | AI2 | Medium Github | Paper Code | |
| Generative Pre-Training (GPT) | OpenAI | Medium | Paper Code | |
| Contextual String Embeddings | Zalando Research | Medium | Paper Code | |
| Self-Governing Neural Networks (SGNN) | Medium | Paper | ||
| Multi-Task Deep Neural Networks (MT-DNN) | Microsoft | Medium | Paper | |
| Generative Pre-Training-2 (GPT-2) | OpenAI | Medium | Paper Code | |
| Universal Language Model Fine-tuning (ULMFiT) | OpenAI | Medium | Paper Code | |
| Sentence Level | Skip-thoughts | Medium Github | Paper Code | |
| InferSent | Medium Github | Paper Code | ||
| Quick-Thoughts | Medium | Paper Code | ||
| General Purpose Sentence (GenSen) | Medium | Paper Code | ||
| Bidirectional Encoder Representations from Transformers (BERT) | Medium | Paper Code | ||
| BERT in Science Domain | Medium | SciBERT Paper BioBERT Paper | ||
| BERT in Clinical Domain | Medium | Clincical BERT Embeddings Paper ClinicalBert Paper | ||
| Unified Language Model for NLP and NLU | Medium | Paper | ||
| Cross-lingual Language Model | Medium | Paper | ||
| Transformer-XL | Medium | Paper | ||
| Document Level | lda2vec | Medium | Paper | |
| doc2vec | Medium Github | Paper |
| Section | Sub-Section | Description | Link |
|---|---|---|---|
| ELI5, LIME and Skater | Medium Github | ||
| SHapley Additive exPlanations (SHAP) | Medium Github | ||
| Anchors | Medium Github |
| Section | Sub-Section | Description | Link |
|---|---|---|---|
| Using Deep Learning can resolve all problem? | Medium Kaggle |
| Section | Sub-Section | Description | Link |
|---|---|---|---|
| Spellcheck | Github | ||
| InferSent | Github |