/cwswk

chinese word segmentation with world knowledge

Primary LanguagePythonMIT LicenseMIT

CWSWK

The source code for paper chinese word segmentation with world knowledge

How to

  1. Download the bert model BERT to folder data/bert/ if you want to train mode CWSB or CWSBD

  2. Preprocess data(will save the train, val, test dataset under folder data)

python preprocess.py
  1. Train and save model CWSB for dataset pku
python train.py -m CWSD -ds pku -save
  1. Debug:
python train.py -m CWSD -ds pku -d
  1. Predict using saved model on epoch 2:
python train.py -m pred_model -ms CWSD -ds pku -s 2

Please cite the paper: