sebastianGehrmann/bottom-up-summary

Required Format for step-a inputs

Closed this issue · 1 comments

CNN-DM dataset contains stories with complete article ending with its summary (highlights). For training this data in OpenNMT, the preprocessing command is:

python preprocess.py -train_src data/cnndm/train.txt.src \
                     -train_tgt data/cnndm/train.txt.tgt \

What is the required format of src-tgt for train.txt.src and train.txt.tgt files. Should I put the whole article in one line and whole summary in one line in respective source and target files?

  • Could you please also share ETA for complete documentation?

please read the current documentation. I link to it the commands here: http://opennmt.net/OpenNMT-py/Summarization.html