This repository contains samples codes for natural language processing in Japanese. It's highly inspired by microsoft/nlp-recipes.
The following is a summary of the commonly used NLP scenarios covered in the repository. Each scenario is demonstrated in one or more scripts or Jupyter notebook examples that make use of the core code base of models and repository utilities.
Category | Methods |
---|---|
Basic | Cleaning, Normalization, Stopwords, Sentence Segmantation, Ruby |
Embeddings | Word2Vec, fastText, Universal Sentence Encoder |
Feature Engineering | Bag-of-Words, TF-IDF, BM25, SWEM, SCDV |
Morphological Analysis | Konoha, nagisa |
Sentence Similarity | Cosine Similarity |
Sentiment Analysis | oseti |
Text Classification | TF-IDF & Logistic Regression, TF-IDF & LightGBM, BERT, T5 |
Visualization | Visualization with Japanese texts |
docker-compose up -d --build
docker exec -it nlp-recipes-ja bash