chinese-word-segmentation

There are 94 repositories under chinese-word-segmentation topic.

Embedding/Chinese-Word-Vectors
100+ Chinese Word Vectors 上百种预训练中文词向量
Language:Python11.8k 285 1682.3k
lancopku/pkuseg-python
pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation
Language:Python6.5k 208 167986
baidu/lac
百度NLP：分词，词性标注，命名实体识别，词重要性
Language:C++3.9k 105 248595
ownthink/Jiagu
Jiagu深度学习自然语言处理工具知识图谱关系抽取中文分词词性标注命名实体识别情感分析新词发现关键词文本摘要文本聚类
Language:Python3.3k 87 71613
wolfgarbe/SymSpell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Language:C#3.2k 71 97298
hankcs/pyhanlp
中文分词
Language:Python3.1k 85 0809
didi/ChineseNLP
Datasets, SOTA results of every fields of Chinese NLP
Language:HTML1.8k 60 11271
lionsoul2014/jcseg
Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction implemented based on TEXTRANK algorithm. Jcseg had a build-in http server and search modules for lucene,solr,elasticsearch,opensearch
Language:Java914 92 57212
mammothb/symspellpy
Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Language:Python802 16 91122
messense/jieba-rs
The Jieba Chinese Word Segmentation Implemented in Rust
Language:Rust755 13 4847
lionsoul2014/friso
High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.
Language:C483 33 2093
monpa-team/monpa
MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
Language:Python245 23 1626
Kyubyong/g2pC
g2pC: A Context-aware Grapheme-to-Phoneme Conversion module for Chinese
Language:Python239 9 931
hemingkx/WordSeg
A PyTorch implementation of a BiLSTM \ BERT \ Roberta (+ BiLSTM + CRF) model for Chinese Word Segmentation (中文分词) .
Language:Python203 1 1642
supercoderhawk/DeepLearning_NLP
基于深度学习的自然语言处理库
Language:Python152 15 340
howl-anderson/MicroTokenizer
一个轻量且功能全面的中文分词器，帮助学生了解分词器的工作原理。MicroTokenizer: A lightweight Chinese tokenizer designed for educational and research purposes. Provides a practical, hands-on approach to understanding NLP concepts, featuring multiple tokenization algorithms and customizable models. Ideal for students, researchers, and NLP enthusiasts..
Language:Python147 9 422
llhthinker/MachineLearningLab
Some experiments about Machine Learning
Language:Python115 3 2191
xtea/chinese_medical_words
手工整理医疗行业词汇、术语等语料。可用于语音识别、对话系统等各类nlp模型训练。
109 3 037
fudannlp16/CWS_Dict
Source codes for paper "Neural Networks Incorporating Dictionaries for Chinese Word Segmentation", AAAI 2018
Language:Python91 4 1632
jcyk/greedyCWS
Source code for an ACL2017 paper on Chinese word segmentation
Language:Python90 5 421
yizhiru/thulac4j
Chinese Word Segmentation Tool, THULAC的Java实现.
Language:Java85 11 1933
NLPIR-team/nlpir-analysis-cn-ictclas
Lucene/Solr Analyzer Plugin. Support MacOS,Linux x86/64,Windows x86/64. It's a maven project, which allows you change the lucene/solr version. //Maven工程，修改Lucene/Solr版本，以兼容相应版本。
Language:Java73 11 528
supercoderhawk/DNN_CWS
利用深度学习实现中文分词
Language:Python58 8 432
dongrixinyu/jiojio
A convenient Chinese word segmentation tool 简便中文分词器
Language:Python46 1 67
voidism/pywordseg
Open Source State-of-the-art Chinese Word Segmentation System with BiLSTM and ELMo. https://arxiv.org/abs/1901.05816
Language:Python40 4 57
supercoderhawk/DeepNLP
基于深度学习的自然语言处理库
Language:Python37 5 018
wchan757/Cantonese_Word_Segmentation
Dictionary for Cantonese word segmentation
34 2 05
hankcs/sub-character-cws
Sub-Character Representation Learning
Language:Python26 4 32
binaryoung/jieba-php
The Jieba Chinese Word Segmentation Implemented in PHP
Language:PHP20 2 48
bububa/jiagu
Jiagu深度学习自然语言处理工具知识图谱关系抽取中文分词词性标注命名实体识别情感分析新词发现关键词文本摘要文本聚类
Language:Go19 1 26
NLPIR-team/NLPIR-ICTCLAS
The Java Package of NLPIR-ICTCLAS.
Language:Java19 7 310
oscarsun72/TextForCtext
為了《中國哲學書電子化計劃》輸入用-加速鍵入與排版，更好的輸入體驗+文房一寶勝四寶WordVBA文史工具-中文博士寫程式
Language:C#19 3 10
Hoiy/berserker
Berserker - BERt chineSE woRd toKenizER
Language:Python17 4 31
wangjksjtu/multi-embedding-cws
Multiple Character Embeddings for Chinese Word Segmentation, ACL 2019
Language:Python17 1 15
GanjinZero/GTS
Code for Unsupervised multi-granular Chinese word segmentation and term discovery via graph partition [JBI]
Language:Python15 2 22
messense/cjieba-py
Python cffi binding to CppJieba
Language:Python15 2 5

chinese-word-segmentation

Embedding/Chinese-Word-Vectors

lancopku/pkuseg-python

baidu/lac

ownthink/Jiagu

wolfgarbe/SymSpell

hankcs/pyhanlp

didi/ChineseNLP

lionsoul2014/jcseg

mammothb/symspellpy

messense/jieba-rs

lionsoul2014/friso

monpa-team/monpa

Kyubyong/g2pC

hemingkx/WordSeg

supercoderhawk/DeepLearning_NLP

howl-anderson/MicroTokenizer

llhthinker/MachineLearningLab

xtea/chinese_medical_words

fudannlp16/CWS_Dict

jcyk/greedyCWS

yizhiru/thulac4j

NLPIR-team/nlpir-analysis-cn-ictclas

supercoderhawk/DNN_CWS

dongrixinyu/jiojio

voidism/pywordseg

supercoderhawk/DeepNLP

wchan757/Cantonese_Word_Segmentation

hankcs/sub-character-cws

binaryoung/jieba-php

bububa/jiagu

NLPIR-team/NLPIR-ICTCLAS

oscarsun72/TextForCtext

Hoiy/berserker

wangjksjtu/multi-embedding-cws

GanjinZero/GTS

messense/cjieba-py