nlpcc2016-chinese-weibo-segmentation

Long Short-Term Memory (LSTM) based model for the NLPCC 2016 shared task - Chinese Weibo word segmentation. The model got 1st place in close and semi-open track. For more details, refer to our paper Recurrent Neural Word Segmentation with Tag Inference.

Requirement

Theano
Lasagne

Notes

The original dataset for this task should be requested by filling up a Agreement Form. So here we only provide a few examples.
Once the original dataset is obtained, one should change the space-splited format to BMES tagging format.
To get the unsupervised features, use scripts by Wu et al., 2014 CistSegment.

Run

Preparing the data (see the Notes)
run the script ccl_nlpcc.py
run the script chunkvec_inference.py

Citation

If you use this software, please cite our paper.

@InProceedings{zhou2016lstmtaginference,
  Title                    = {Recurrent neural word segmentation with tag inference},
  Author                   = {Qianrong Zhou, Long Ma, Zhenyu Zheng, Yue Wang, and Xiaojie Wang},
  Booktitle                = {Proceedings of The Fifth Conference on Natural Language Processing and Chinese Computing \& The Twenty Fourth
International Conference on Computer Processing of Oriental Languages},
  Year                     = {2016}
}

seven-coder/nlpcc2016-chinese-weibo-segmentation

nlpcc2016-chinese-weibo-segmentation

Requirement

Notes

Run

Citation