ZaloAI-Jaist/VMLU

What does dev, test, valid mean?

Closed this issue · 2 comments

I have a question about 3 dataset files:

  • Which file will we use to do inference and submit to the server? In the sample code, I see you guys do inference on all jsonl files but only the test.json file answer is left empty.
for file in jsonl_files:
        with open(file, "r", encoding="utf-8") as f:
...

Thanks in advance.

Further information: the dev set is for few-shot inference, and the validation set is used to verify whether few-shot works on the dataset before evaluating it on the test set. Both sets are provided with answers. If LLMs are being evaluated with zero-shot, only the test set is needed.