word-segmentation
There are 142 repositories under word-segmentation topic.
google/sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
baidu/lac
百度NLP:分词,词性标注,命名实体识别,词重要性
wolfgarbe/SymSpell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
PyThaiNLP/pythainlp
Thai natural language processing in Python
VKCOM/YouTokenToMe
Unsupervised text tokenizer focused on computational efficiency
mammothb/symspellpy
Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
ckiplab/ckip-transformers
CKIP Transformers
cbaziotis/ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
vncorenlp/VnCoreNLP
A Vietnamese natural language processing toolkit (NAACL 2018)
bab2min/Kiwi
Kiwi(지능형 한국어 형태소 분석기)
JayYip/m3tl
BERT for Multitask Learning
modelscope/AdaSeq
AdaSeq: An All-in-One Library for Developing State-of-the-Art Sequence Understanding Models
taishi-i/nagisa
A Japanese tokenizer based on recurrent neural networks
ku-nlp/jumanpp
Juman++ (a Morphological Analyzer Toolkit)
jacksonllee/pycantonese
Cantonese Linguistics and NLP
yongzhuo/Pytorch-NLU
中文文本分类、序列标注工具包(pytorch),支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词、抽取式文本摘要等序列标注任务。 Chinese text classification and sequence labeling toolkit, supports multi class and multi label classification, text similsrity, text summary and NER.
bab2min/kiwipiepy
Python API for Kiwi
ikegami-yukino/mecab
This repository is archived! The maintained MeCab can be found https://github.com/shogo82148/mecab
jidasheng/bi-lstm-crf
A PyTorch implementation of the BI-LSTM-CRF model.
monpa-team/monpa
MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
fastcws/fastcws
轻量级高性能中文分词项目
taishi-i/toiro
A comparison tool of Japanese tokenizers
ckiplab/ckipnlp
CKIP CoreNLP Toolkits
peterolson/hanzi-tools
Converts from Chinese characters to pinyin, between simplified and traditional, and does word segmentation.
Ailln/nlp-roadmap
🗺️ 一个自然语言处理的学习路线图
fudannlp16/CWS_Dict
Source codes for paper "Neural Networks Incorporating Dictionaries for Chinese Word Segmentation", AAAI 2018
wolfgarbe/WordSegmentationTM
Fast Word Segmentation with Triangular Matrix
datquocnguyen/RDRsegmenter
A Fast and Accurate Vietnamese Word Segmenter (LREC 2018)
jcyk/CWS
Source code for an ACL2016 paper of Chinese word segmentation
phongnt570/UETsegmenter
A toolkit for Vietnamese word segmentation
ruanchaves/hashformers
Hashformers is a framework for hashtag segmentation with Transformers and Large Language Models (LLMs).
MighTguY/customized-symspell
Java port of SymSpell: 1 million times faster through Symmetric Delete spelling correction algorithm
ye-kyaw-thu/sylbreak
Syllable segmentation tool for Myanmar language (Burmese) by Ye.
dnanhkhoa/python-vncorenlp
A Python wrapper for VnCoreNLP using a bidirectional communication channel.
undertheseanlp/word_tokenize
Vietnamese Word Tokenize
giganticode/codeprep
A toolkit for pre-processing large source code corpora