/Document-level-text-simplification

The repository contains the dataset and the code of the paper: Document-Level Text Simplification: Dataset, Metric and Model.

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Document-level-text-simplification

The repository contains the dataset and the code of the paper: Document-Level Text Simplification: Dataset, Criteria and Baseline (https://arxiv.org/pdf/2110.05071.pdf).

The Dataset folder contains the training set, validation set and test set of the D-Wikipedia dataset. src denotes the file containing the original articles, tgt denotes the file containing the simplified articles, and each line in the file denotes one article. For the Newsela corpus, you need to get a permission to use the data as mentioned in our paper.

D_SARI.py is the implementation of the D-SARI metric. And the D-SARI metric is an improvement of the SARI metric for the document-level text simplification. The original code for the SARI metric can be found at https://github.com/cocoxu/simplification/blob/master/SARI.py.

If you have any questions, please feel free to contact us at sunrenliangpku@gmail.com.