Is there an issue with the provided dataset?

Question

Is there an issue with the provided dataset?

Closed this issue 5 months ago · 2 comments

Dear Author,

Thank you very much for open-sourcing your code and dataset. I encountered an issue with a missing cache file while running the GSM experiment. Your code requires reading BERT scores between examples from 'misc/gsm--tr0-512-dv0-500--bertscoreapi-question--scores.json'. However, the provided cache folder does not contain this score file. Upon checking the provided files, I found that they contain results with keys dict_keys(['text', 'index', 'finish_reason', 'prompt', 'completion_offset']), where prompt includes the final example list input to the model. Therefore, I am curious if there might be some issues with the dataset. I look forward to your response.

Best regards

Answer 1 · 2024-07-18T04:10:32.000Z

Thanks for the question. Yes the cache file for bertscores is not there. I think the code is set to compute bertscores using bertscore package (line 278 in selective.py) in that case. (this works on my end). Can you try again and let me know if this happens. Thanks

Answer 2 · 2024-08-08T03:09:24.000Z

Thank you for your patience!