NDPR
This repository contains the source code of our paper, Recovering Dropped Pronouns in Chinese Conversations via Modeling Their Referents, which is accepted for publication at NAACL 2019.
Usage
python train_v4.py
Datasets
We demonstrate our model on three datasets, which are: Chinese SMS, TC section of OntoNotes Release 5.0 and BaiduZhidao.
- Chinese SMS
This dataset is introduced in [1], which consists of 684 Chinese SMS files. In our work, we take the same train/dev/test data split as [1], which reserves 16.7% of the training set as a development set.
- OntoNotes Release 5.0(TC section)
This dataset is published in CoNLL 2012 Shared Task. We use the Chinese telephone conversation(TC) section in our work, which contains 9,507 sentences. Since the original dataset only has coreference annotations for anaphoric zero pronouns, we annotate them according to dropped pronoun recovery annotation guidelines described in [1].
- BaiduZhidao
This dataset is introduced in [2], which is a question answering dialogue dataset containing 11,160 sentences in the raw data. In our work, we make data preprocessing by splitting the entire corpus into each independent QA segmentation, removing noise data and annotating the participant information for each sentence. Our processed dataset contains 9,376 sentences.
The train/dev/test setting of these three datasets is shown in Table
Train | Dev | Test | ||||
Sentence | DPs | Sentence | DPs | Sentence | DPs | |
SMS | 32,860 | 25,805 | 3,073 | 2,395 | 4,346 | 3,411 |
TC | 6,562 | 4,207 | 1,408 | 890 | 1,406 | 786 |
BaiduZhidao | 5,504 | 4,312 | 1,175 | 732 | 1,178 | 832 |
Citation
If this work is useful in your research, please kindly cite our paper.
@inproceedings{yang2019text,
title={Recovering Dropped Pronouns in Chinese Conversations via Modeling Their Referents},
author={Jingxuan Yang, Jianzhuo Tong, Si Li, Sheng Gao, Jun Guo and Nianwen Xue},
booktitle={NAACL},
year={2019}
}
Reference
[1] Yang, Yaqin & Liu, Yalin & Xue, Nianwen (2015). Recovering dropped pronouns from Chinese text messages. 2. 309-313. 10.3115/v1/P15-2051.
[2] Zhang, Weinan & Liu, Ting & Yin, Qingyu , & Zhang, Yu . (2016). Neural recovery machine for chinese dropped pronoun.