What does dev, test, valid mean?
Closed this issue · 2 comments
tisu19021997 commented
I have a question about 3 dataset files:
- Which file will we use to do inference and submit to the server? In the sample code, I see you guys do inference on all
jsonl
files but only thetest.json
file answer is left empty.
for file in jsonl_files:
with open(file, "r", encoding="utf-8") as f:
...
Thanks in advance.
ZaloAI-Jaist commented
Hi Quang,
You just need to infer on the test folder, because we provided answers for dev and valid set.
And our system evaluates only on test set.
Thank you.
…________________________________
From: Pham Minh Quang ***@***.***>
Sent: 30 October 2023 14:44
To: ZaloAI-Jaist/VMLU ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [ZaloAI-Jaist/VMLU] What does dev, test, valid mean? (Issue #2)
You don't often get email from ***@***.*** Learn why this is important<https://aka.ms/LearnAboutSenderIdentification>
I have a couple of questions about 3 dataset files:
* What is the meaning of "dev", "test" and "valid"?
* Which file will we use to do inference and submit to the server? In the sample code, I see you guys do inference on all jsonl files, so I have to do the same, right?
for file in jsonl_files:
with open(file, "r", encoding="utf-8") as f:
...
Thanks in advance.
—
Reply to this email directly, view it on GitHub<#2>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BB7WVTGF3W7N5UPHHYPXNTLYB5LFTAVCNFSM6AAAAAA6VR45YCVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3DONZQHA4TEMQ>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
ZaloAI-Jaist commented
Further information: the dev set is for few-shot inference, and the validation set is used to verify whether few-shot works on the dataset before evaluating it on the test set. Both sets are provided with answers. If LLMs are being evaluated with zero-shot, only the test set is needed.