shigashiyama's Stars
EleutherAI/gpt-neo
An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
facebookresearch/AugLy
A data augmentations library for audio, image, text, and video.
mjpost/sacrebleu
Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons
styfeng/DataAug4NLP
Collection of papers and resources for data augmentation for NLP.
diasks2/pragmatic_segmenter
Pragmatic Segmenter is a rule-based sentence boundary detection gem that works out-of-the-box across many languages.
Martinsos/edlib
Lightweight, super fast C/C++ (& Python) library for sequence alignment using edit (Levenshtein) distance.
google-research/byt5
trungtv/pyvi
Python Vietnamese Core NLP Toolkit
ko-nlp/Open-korean-corpora
Open Korean NLP Dataset Curation for the Users All Around the Globe
yagays/ja-timex
自然言語で書かれた時間情報表現を抽出/規格化するルールベースの解析器
ueno/libkkc
Japanese Kana Kanji conversion input method library
MorinoseiMorizo/jparacrawl-finetune
An example usage of JParaCrawl pre-trained Neural Machine Translation (NMT) models.
masakhane-io/masakhane-ner
facebookresearch/SimulEval
SimulEval: A General Evaluation Toolkit for Simultaneous Translation
octanove/shiba
Pytorch implementation and pre-trained Japanese model for CANINE, the efficient character-level transformer.
aozorahack/aozorabunko_text
text-only archives of www.aozora.gr.jp
tsuruoka-lab/BSD
The Business Scene Dialogue corpus
yohokuno/neural_ime
Neural IME: Neural Input Method Engine
hyunwoongko/asian-bart
Asian language bart models (En, Ja, Ko, Zh, ECJK)
kampersanda/xcdat
Fast compressed trie dictionary library
matbahasa/TALPCo
TUFS Asian Language Parallel Corpus
warnikchow/kosp2e
Korean Speech to English Translation Corpus
kmiya/naist-thesis-tmpl-en
A modern LaTeX template for your doctoral dissertation or master's thesis of NAIST-IS.
MicrosoftTranslator/MSLT-Corpus
Microsoft Speech Language Translation (MSLT) Corpus
de9uch1/fairseq-tutorial
Fairseq tutorial
ngovinhtn/JaViCorpus
nttcslab-nlp/discourse-mt-test-sets
A Test Set for Discourse Translation from Japanese to English
tsuruoka-lab/AMI-Meeting-Parallel-Corpus
AMI Meeting Parallel Corpus
ku-nlp/jumanpp-jumandic
Scripts for training Jumandic Juman++ model
ksudoh/wmt15-17-humaneval
Classification-based human evaluation on WMT 2015-2017 Metrics dataset