Sequence to sequence Chinese word segmentation (CWS) post-editing

This repository is an implement of the post-editing method of the paper: Neural Chinese Word Segmentation as Sequence to Sequence Translation

Require

Environment:Python 2.7x

Encoding Format:UTF8

Usage

python cws_postediting.py --ori=<original inputfile> --seg=<segmented inputfile> --out=<post-editing outputfile>

Example

Files in test_data/ are examples.

test_data/
  ├── original_input.txt
  ├── segmented.txt
  └── out.txt

original_input.txt is an example of original input file.

segmentent.txt is an example of segmented file containing tranlsations errors.

out.txt the output file of post-editing.

To see the testing case, please run

python cws_postediting.py

or

python cws_postediting.py \
--ori=./test_data/original_input.txt \
--seg=./test_data/segmented.txt \
--out=./test_data/out.txt