Long Short-Term Memory (LSTM) based model for the NLPCC 2016 shared task - Chinese Weibo word segmentation. The model got 1st place in close and semi-open track. For more details, refer to our paper Recurrent Neural Word Segmentation with Tag Inference.
- Theano
- Lasagne
- The original dataset for this task should be requested by filling up a Agreement Form. So here we only provide a few examples.
- Once the original dataset is obtained, one should change the space-splited format to
BMES
tagging format. - To get the unsupervised features, use scripts by Wu et al., 2014 CistSegment.
- Preparing the data (see the Notes)
- run the script
ccl_nlpcc.py
- run the script
chunkvec_inference.py
If you use this software, please cite our paper.
@InProceedings{zhou2016lstmtaginference,
Title = {Recurrent neural word segmentation with tag inference},
Author = {Qianrong Zhou, Long Ma, Zhenyu Zheng, Yue Wang, and Xiaojie Wang},
Booktitle = {Proceedings of The Fifth Conference on Natural Language Processing and Chinese Computing \& The Twenty Fourth
International Conference on Computer Processing of Oriental Languages},
Year = {2016}
}