superAE: A Python repository from cotitan

Citation

If you use this code for your research, please cite the paper this code is based on: Autoencoder as Assistant Supervisor: Improving Text Representation for Chinese Social Media Text Summarization:

@inproceedings{Ma2016superAE,
  title   = {Autoencoder as Assistant Supervisor: Improving Text Representation for Chinese Social Media Text Summarization},
  author  = {Shuming Ma and Xu Sun and Junyang Lin and Houfeng Wang},
  booktitle = {{ACL} 2018},
  year      = {2018}
}

Requirements

python3.5
pytorch=0.3.1

Data Preprocessing

lcsts: If you've got this dataset in xml format,

first extract the source content and summary, save as *.src and *.tgt respectively.
run preprocess.py at root dir to preprocess the data, refer to following script:

python preprocess.py -train_src data/lcsts/PART_I.src -train_tgt data/lcsts/PART_I.tgt -valid_src data/lcsts/PART_II.src -valid_tgt data/lcsts/PART_II.tgt -save_data data/lcsts/lcsts.low.share -share -src_char -tgt_char

Training config

in lcsts.yaml, remember to modify data field

Train

CUDA_VISIBLE_DEVICES=0 python train.py -config lcsts.yaml

You're supposed to run this script in a command line instead of any IDE.
CUDA_VISIBLE_DEVICES is optional, remove this to use cpu.

cotitan/superAE

Citation

Requirements

Data Preprocessing

Training config

Train