chinese-word-segmentation

There are 94 repositories under chinese-word-segmentation topic.

  • Embedding/Chinese-Word-Vectors

    100+ Chinese Word Vectors 上百种预训练中文词向量

    Language:Python11.8k2851682.3k
  • lancopku/pkuseg-python

    pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation

    Language:Python6.5k208167986
  • baidu/lac

    百度NLP:分词,词性标注,命名实体识别,词重要性

    Language:C++3.9k105248595
  • ownthink/Jiagu

    Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类

    Language:Python3.3k8771613
  • wolfgarbe/SymSpell

    SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

    Language:C#3.2k7197298
  • hankcs/pyhanlp

    中文分词

    Language:Python3.1k850809
  • didi/ChineseNLP

    Datasets, SOTA results of every fields of Chinese NLP

    Language:HTML1.8k6011271
  • lionsoul2014/jcseg

    Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction implemented based on TEXTRANK algorithm. Jcseg had a build-in http server and search modules for lucene,solr,elasticsearch,opensearch

    Language:Java9149257212
  • mammothb/symspellpy

    Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

    Language:Python8021691122
  • messense/jieba-rs

    The Jieba Chinese Word Segmentation Implemented in Rust

    Language:Rust755134847
  • lionsoul2014/friso

    High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.

    Language:C483332093
  • monpa

    monpa-team/monpa

    MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型

    Language:Python245231626
  • g2pC

    Kyubyong/g2pC

    g2pC: A Context-aware Grapheme-to-Phoneme Conversion module for Chinese

    Language:Python2399931
  • hemingkx/WordSeg

    A PyTorch implementation of a BiLSTM \ BERT \ Roberta (+ BiLSTM + CRF) model for Chinese Word Segmentation (中文分词) .

    Language:Python20311642
  • supercoderhawk/DeepLearning_NLP

    基于深度学习的自然语言处理库

    Language:Python15215340
  • howl-anderson/MicroTokenizer

    一个轻量且功能全面的中文分词器,帮助学生了解分词器的工作原理。MicroTokenizer: A lightweight Chinese tokenizer designed for educational and research purposes. Provides a practical, hands-on approach to understanding NLP concepts, featuring multiple tokenization algorithms and customizable models. Ideal for students, researchers, and NLP enthusiasts..

    Language:Python1479422
  • llhthinker/MachineLearningLab

    Some experiments about Machine Learning

    Language:Python11532191
  • xtea/chinese_medical_words

    手工整理医疗行业词汇、术语等语料。可用于语音识别、对话系统等各类nlp模型训练。

  • fudannlp16/CWS_Dict

    Source codes for paper "Neural Networks Incorporating Dictionaries for Chinese Word Segmentation", AAAI 2018

    Language:Python9141632
  • jcyk/greedyCWS

    Source code for an ACL2017 paper on Chinese word segmentation

    Language:Python905421
  • yizhiru/thulac4j

    Chinese Word Segmentation Tool, THULAC的Java实现.

    Language:Java85111933
  • NLPIR-team/nlpir-analysis-cn-ictclas

    Lucene/Solr Analyzer Plugin. Support MacOS,Linux x86/64,Windows x86/64. It's a maven project, which allows you change the lucene/solr version. //Maven工程,修改Lucene/Solr版本,以兼容相应版本。

    Language:Java7311528
  • supercoderhawk/DNN_CWS

    利用深度学习实现中文分词

    Language:Python588432
  • dongrixinyu/jiojio

    A convenient Chinese word segmentation tool 简便中文分词器

    Language:Python46167
  • voidism/pywordseg

    Open Source State-of-the-art Chinese Word Segmentation System with BiLSTM and ELMo. https://arxiv.org/abs/1901.05816

    Language:Python40457
  • supercoderhawk/DeepNLP

    基于深度学习的自然语言处理库

    Language:Python375018
  • wchan757/Cantonese_Word_Segmentation

    Dictionary for Cantonese word segmentation

  • hankcs/sub-character-cws

    Sub-Character Representation Learning

    Language:Python26432
  • binaryoung/jieba-php

    The Jieba Chinese Word Segmentation Implemented in PHP

    Language:PHP20248
  • bububa/jiagu

    Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类

    Language:Go19126
  • NLPIR-team/NLPIR-ICTCLAS

    The Java Package of NLPIR-ICTCLAS.

    Language:Java197310
  • oscarsun72/TextForCtext

    為了《中國哲學書電子化計劃》輸入用-加速鍵入與排版,更好的輸入體驗+文房一寶勝四寶WordVBA文史工具-中文博士寫程式

    Language:C#19310
  • Hoiy/berserker

    Berserker - BERt chineSE woRd toKenizER

    Language:Python17431
  • wangjksjtu/multi-embedding-cws

    Multiple Character Embeddings for Chinese Word Segmentation, ACL 2019

    Language:Python17115
  • GanjinZero/GTS

    Code for Unsupervised multi-granular Chinese word segmentation and term discovery via graph partition [JBI]

    Language:Python15222
  • messense/cjieba-py

    Python cffi binding to CppJieba

    Language:Python1525