Where is the init_embd_file './tools/numberbatch-en-19.08.txt' ? And how is the file keyword.vocab generated?

Question

Where is the init_embd_file './tools/numberbatch-en-19.08.txt' ? And how is the file keyword.vocab generated?

LiuChang97 opened this issue 4 years ago · 3 comments

LiuChang97 commented 4 years ago

Hi, I have some trouble running ./script/inference.sh . (using dailydialog dataset)

When running main_for_metric_grade.py, there is no such file: ./data/DailyDialog/keyword.vocab. How the file is generated?
When running main_for_metric_grade.py, there is no init_embd_file: ./tools/numberbatch-en-19.08.txt

Answer 1 · 2020-12-09T06:19:39.000Z

Sorry for our misleading README, and we have updated it now.
About your questions:

For now, the "keyword.vocab" file is in the processed training data and you need to download the provided data (or generate it from scratch) for inference. Thank you for pointing it out and we will fix it later.
"numberbatch-en-19.08.txt" is in the tools we provided, please download and unzip it before using GRADE.

Answer 2 · 2020-12-10T11:10:06.000Z

Thanks a lot! That's very helpful.

I have one more question.
If I want to adopt GRADE to other dialog datasets such as ConvAI2, I wonder if I only need to generate 4 new files based on the new dataset
(1)original_dialog_merge.keyword, (2)original_dialog_merge.ctx_keyword, (3)original_dialog_merge.rep_keyword,(4)test_text.pkl.
While the following files you provided in ./data/DailyDialog can be reused?
(1)the provided GRADE checkpoint, (2)keyword.vocab(3)dialog_keyword_tuples_multiGraph.hop, (4)1st_hop_nr10.embedding, (5)2nd_hop_nr10.embedding?

Answer 3 · 2020-12-12T00:48:23.000Z

Thanks a lot! That's very helpful.

I have one more question.
If I want to adopt GRADE to other dialog datasets such as ConvAI2, I wonder if I only need to generate 4 new files based on the new dataset
(1)original_dialog_merge.keyword, (2)original_dialog_merge.ctx_keyword, (3)original_dialog_merge.rep_keyword,(4)test_text.pkl.
While the following files you provided in ./data/DailyDialog can be reused?
(1)the provided GRADE checkpoint, (2)keyword.vocab(3)dialog_keyword_tuples_multiGraph.hop, (4)1st_hop_nr10.embedding, (5)2nd_hop_nr10.embedding?

These 4 files are generated by running the "inference.sh" and you don't need to add extra code to generate them. What you need to do are to update the "load_dataset" function in "./preprocess/extract_keywords.py" and provide your own dataset in the specific format as described in the README.