KoichiYasuoka's Stars
chakki-works/sumeval
Well tested & Multi-language evaluation framework for text summarization.
polm/fugashi
A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.
ikegami-yukino/mecab
Notice: This repository will be archived... This repository is for building Windows 64-bit MeCab binary and improving MeCab Python binding.
explosion/tokenizations
Robust and Fast tokenizations alignment library for Rust and Python https://tamuhey.github.io/tokenizations/
taishi-i/toiro
A comparison tool of Japanese tokenizers
shenshen-hungry/Ancient-Chinese-Segmentation
A tool for ancient Chinese segmentation.
retarfi/language-pretraining
Pre-training Language Models for Japanese
akirakubo/bert-japanese-aozora
Japanese BERT trained on Aozora Bunko and Wikipedia, pre-tokenized by MeCab with UniDic & SudachiPy
clarinsi/classla
CLASSLA Fork of the Official Stanford NLP Python Library for Many Human Languages
amir-zeldes/HebPipe
An NLP pipeline for Hebrew
informatix-inc/bert
megagonlabs/UD_Japanese-GSD
Japanese data from the Google UDT 2.0.
megagonlabs/ginza-transformers
Use custom tokenizers in spacy-transformers
ipipan/combo
Dependency Parsing library
gossebouma/lassy2ud
Lassy Small to Universal Dependencies Conversion
UniversalDependencies/UD_Japanese-GSDLUW
Long-unit-word version of UD_Japanese-GSD
UniversalDependencies/UD_Tatar-NMCTT