Where is the init_embd_file './tools/numberbatch-en-19.08.txt' ? And how is the file keyword.vocab generated?
LiuChang97 opened this issue · 3 comments
Hi, I have some trouble running ./script/inference.sh . (using dailydialog dataset)
- When running main_for_metric_grade.py, there is no such file: ./data/DailyDialog/keyword.vocab. How the file is generated?
- When running main_for_metric_grade.py, there is no init_embd_file: ./tools/numberbatch-en-19.08.txt
Sorry for our misleading README, and we have updated it now.
About your questions:
- For now, the "keyword.vocab" file is in the processed training data and you need to download the provided data (or generate it from scratch) for inference. Thank you for pointing it out and we will fix it later.
- "numberbatch-en-19.08.txt" is in the tools we provided, please download and unzip it before using GRADE.
Thanks a lot! That's very helpful.
I have one more question.
If I want to adopt GRADE to other dialog datasets such as ConvAI2, I wonder if I only need to generate 4 new files based on the new dataset
(1)original_dialog_merge.keyword, (2)original_dialog_merge.ctx_keyword, (3)original_dialog_merge.rep_keyword,(4)test_text.pkl.
While the following files you provided in ./data/DailyDialog can be reused?
(1)the provided GRADE checkpoint, (2)keyword.vocab(3)dialog_keyword_tuples_multiGraph.hop, (4)1st_hop_nr10.embedding, (5)2nd_hop_nr10.embedding?
Thanks a lot! That's very helpful.
I have one more question.
If I want to adopt GRADE to other dialog datasets such as ConvAI2, I wonder if I only need to generate 4 new files based on the new dataset
(1)original_dialog_merge.keyword, (2)original_dialog_merge.ctx_keyword, (3)original_dialog_merge.rep_keyword,(4)test_text.pkl.
While the following files you provided in ./data/DailyDialog can be reused?
(1)the provided GRADE checkpoint, (2)keyword.vocab(3)dialog_keyword_tuples_multiGraph.hop, (4)1st_hop_nr10.embedding, (5)2nd_hop_nr10.embedding?
These 4 files are generated by running the "inference.sh" and you don't need to add extra code to generate them. What you need to do are to update the "load_dataset" function in "./preprocess/extract_keywords.py" and provide your own dataset in the specific format as described in the README.