/SDISS

Code for paper Neural Sentence Simplification with Semantic Dependency Information by Zhe Lin, Xiaojun Wan. This paper is accepted by AAAI'21.

Primary LanguagePythonMIT LicenseMIT

Neural Sentence Simplification with Semantic Dependency Information

Code for paper Neural Sentence Simplification with Semantic Dependency Information by Zhe Lin, Xiaojun Wan. This paper is accepted by AAAI'21. Please contact me at linzhe@pku.edu.cn for any question.

Structure

overview

System Output

If you are looking for system output and don't bother to install dependencies and train a model (or run a pre-trained model), the Result folder is for you.

Dependencies

PyTorch 1.4
NLTK 3.5
stanfordcorenlp
tqdm 4.59.0

We provide SARI metric in our code, you need to set up according to here, and change the path in eval.py.

Datasets

We provided the original and preprocessed datasets on release page, include Newsela, WikiSmall and WikiLarge. You can also get them on Newsela, WikiSmall and WikiLarge

8 references WikiLarge test set can be downloaded here.

Note that, the copyright of the Newsela belongs to https://newsela.com, we only provide it as research apporaches. For any purpose of using these datasets beyond academic research please contacts newsela.com.

Preprocess

We provide all preprocessed data of Newsela, WikiSmall and WikiLarge on release page.

In order to reduce the vocabulary size, we tag words with their named entities using the Stanford CoreNLP tool (Manning et al. 2014), and anonymize with NE@N token, N indicates NE@N is the N-th distinct NE typed entity. If you want to leverage your own datasets, please employ NER.py to replace Named-entity in the sentence. You must provide SDP graph of datasets yourself in accordance with the format of SDP graph data in the file.

Train

All configurations of the training step are shown in the Parameter.py. We will generate the validation data and evaluate the result after each epoch training.

Changing the mode in Parameter.py into train, then run the following code to start training model.

python main.py

Inference

Change the mode in Parameter.py into test to begin inference.

We provide pre-trained models of three benchmark datasets release page.

Evaluation

We provide SARI, BLEU and FKGL evaluation metrics in our code.

Notice that, due to the problem of encoding of python2 and python3, the FKGL we provided may be a bit different from the previous version and can only as a reference. We final calculate the FKGL score followed here on python2.

Our code only provide SARI with one reference. The WikiLarge which containing with 8 references should be evaluated as here.

Results

overview

Reference

If you use any content of this repo for your work, please cite the following bib entry:

@article{lin2021simplification,
  title={Neural Sentence Simplification with Semantic Dependency Information},
  author={Zhe Lin, Xiaojun Wan},
  journal={AAAI Workshop on Deep Learning on Graphs: Methods and Applications},
  year={2021}
}