Required Format for step-a inputs
Closed this issue · 1 comments
mohsinjuni commented
CNN-DM dataset contains stories with complete article ending with its summary (highlights). For training this data in OpenNMT, the preprocessing command is:
python preprocess.py -train_src data/cnndm/train.txt.src \
-train_tgt data/cnndm/train.txt.tgt \
What is the required format of src-tgt for train.txt.src and train.txt.tgt files. Should I put the whole article in one line and whole summary in one line in respective source and target files?
- Could you please also share ETA for complete documentation?
sebastianGehrmann commented
please read the current documentation. I link to it the commands here: http://opennmt.net/OpenNMT-py/Summarization.html