Repository to show how NLP can tacke real problem. Including the source code, dataset, state-of-the art in NLP
Section | Sub-Section | Research Lab | Story | Paper & Code |
---|---|---|---|---|
Augmentation | Data Augmentation in NLP | Medium | ||
Augmentation | Data Augmentation library for text | Medium | ||
Augmentation | Data Augmentation library for Speech Recognition | Medium | ||
Augmentation | Data Augmentation library for Audio | Medium |
Section | Sub-Section | Description | Link |
---|---|---|---|
Tokenization | Subword Tokenization | Medium | |
Tokenization | Word Tokenization | Medium Github | |
Tokenization | Sentence Tokenization | Medium Github | |
Part of Speech | Medium Github | ||
Lemmatization | Medium Github | ||
Stemming | Medium Github | ||
Stop Words | Medium Github | ||
Phrase Word Recognition | |||
Spell Checking | Lexicon-based | Peter Norvig algorithm | Medium Github |
Lexicon-based | Symspell | Medium Github | |
Machine Translation | Statistical Machine Translation | Medium | |
Machine Translation | Attention | Medium | |
String Matching | Fuzzywuzzy | Medium Github |
Section | Sub-Section | Research Lab | Story | Paper & Code |
---|---|---|---|---|
Pattern-based Recognition | Medium | |||
Lexicon-based Recognition | Medium | |||
Pre-trained NER | Spacy | Medium Github | ||
Custom NER |
Section | Sub-Section | Research Lab | Story | Paper & Code |
---|---|---|---|---|
Printed Text | Google Cloud Vision API | Medium | Paper | |
Handwriting | LSTM | Medium | Paper |
Section | Sub-Section | Description | Link |
---|---|---|---|
Extractive Approach | Medium Github | ||
Abstractive Approach |
Section | Sub-Section | Description | Link | Paper |
---|---|---|---|---|
Euclidean Distance, Cosine Similarity and Jaccard Similarity | Medium Github | |||
Edit Distance | Levenshtein Distance | Medium Github | ||
Word Moving Distance (WMD) | Medium Github | |||
Supervised Word Moving Distance (S-WMD) | Medium | |||
Manhattan LSTM | Medium | Paper |
Section | Sub-Section | Research Lab | Story | Paper & Code |
---|---|---|---|---|
Traditional Method | Bag-of-words (BoW) | Medium Github | ||
Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) | Medium Github | |||
Character Level | Character Embedding | New York University | Medium Github | Paper |
Word Level | Negative Sampling and Hierarchical Softmax | Medium | ||
Word2Vec, GloVe, fastText | Medium Github | |||
Contextualized Word Vectors (CoVe) | Salesforce | Medium Github | Paper Code | |
Embeddings from Language Models (ELMo) | AI2 | Medium Github | Paper Code | |
Generative Pre-Training (GPT) | OpenAI | Medium | Paper Code | |
Contextual String Embeddings | Zalando Research | Medium | Paper Code | |
Self-Governing Neural Networks (SGNN) | Medium | Paper | ||
Multi-Task Deep Neural Networks (MT-DNN) | Microsoft | Medium | Paper | |
Generative Pre-Training-2 (GPT-2) | OpenAI | Medium | Paper Code | |
Universal Language Model Fine-tuning (ULMFiT) | OpenAI | Medium | Paper Code | |
Sentence Level | Skip-thoughts | Medium Github | Paper Code | |
InferSent | Medium Github | Paper Code | ||
Quick-Thoughts | Medium | Paper Code | ||
General Purpose Sentence (GenSen) | Medium | Paper Code | ||
Bidirectional Encoder Representations from Transformers (BERT) | Medium | Paper Code | ||
BERT in Science Domain | Medium | SciBERT Paper BioBERT Paper | ||
BERT in Clinical Domain | Medium | Clincical BERT Embeddings Paper ClinicalBert Paper | ||
Document Level | lda2vec | Medium | Paper | |
doc2vec | Medium Github | Paper |
Section | Sub-Section | Description | Link |
---|---|---|---|
ELI5, LIME and Skater | Medium Github | ||
SHapley Additive exPlanations (SHAP) | Medium Github | ||
Anchors | Medium Github |
Section | Sub-Section | Description | Link |
---|---|---|---|
Using Deep Learning can resolve all problem? | Medium Kaggle |
Section | Sub-Section | Description | Link |
---|---|---|---|
Spellcheck | Github | ||
InferSent | Github |