A list of NLP(Natural Language Processing) tutorials built on PyTorch.
A step-by-step tutorial on how to implement and adapt to the simple real-word NLP task.
-
News Category Classification: This repo provides a simple PyTorch implementation of Text Classification, with simple annotation. Here we use Huffpost news corpus including corresponding category. The classification model trained on this dataset identify the category of news article based on their headlines and descriptions.
Keyword: CBoW, LSTM, fastText, Text cateogrization -
IMDb Movie Review Classification: This text classification tutorial trains a transformer model on the IMDb movie review dataset for sentiment analysis. It provides a simple PyTorch implementation, with simple annotation.
Keyword: Transformer, Sentiment analysis -
Question-Answer Matching: This repo provides a simple PyTorch implementation of Question-Answer matching. Here we use the corpus from Stack Exchange to build embeddings for entire questions. Using those embeddings, we find similar questions for a given question, and show the corresponding answers to those I found.
Keyword: CBoW, TF-IDF, LSTM with variable-length seqeucnes -
Movie Review Classification (Korean NLP): This repo provides a simple Keras implementation of TextCNN for Text Classification. Here we use the movie review corpus written in Korean. The model trained on this dataset identify the sentiment based on review text.
Keyword: TextCNN, Sentiment analysis
- Neural Machine Translation: This repo provides a simple PyTorch implementation of Neural Machine Translation, along with an intrinsic/extrinsic comparison of various sequence-to-sequence (seq2seq) models in translation.
Keyword: sequence to seqeunce network(seq2seq), Attention, Autoregressive, Teacher-forcing
- Neural Language Model: This repo provides a simple PyTorch implementation of Neural Language Model for natural language understanding.
Here we implement unidirectional/bidirectional language models, and pre-train language representations from unlabeled text (Wikipedia corpus).
Keyword: Autoregressive Language Model, Perplexity