/chinese-tokenizer

chinese word segmentation comparison test

Primary LanguageJupyter Notebook

Chinese Tokenizer

Simple Comparison of chinese word segmentation and pos tagger for text mining.

1) Chinese Word Segmentation comparison

The most basic feature of chinese tokenizer is word segmentation

PKUSEG

LAC

thulac

monpa

jiagu

hanlp

[etc] deeplearning based SoTA model

hanlp, LAC is good to use.

2) Chinese KeyPhrase Extractor

  • hanlp + pos
  • lac + pos
  • BERT-JointKPE