[question] testset evaluation submission

Question

[question] testset evaluation submission

yananchen1989 opened this issue 3 months ago · 3 comments

yananchen1989 commented 3 months ago

hi authors.

i see that in https://huggingface.co/spaces/osunlp/TravelPlannerLeaderboard
Format of Submission:
{"idx":0,"query":"Natural Language Query","plan":[{"day": 1, "current_city": "from [City A] to [City B]", "transportation": "Flight Number: XXX, from A to B", "breakfast": "Name, City", "attraction": "Name, City;Name, City;...;Name, City;", "lunch": "Name, City", "dinner": "Name, City", "accommodation": "Name, City"}, {"day": 2, "current_city": "City B", "transportation": "-", "breakfast": "Name, City", "attraction": "Name, City;Name, City;", "lunch": "Name, City", "dinner": "Name, City", "accommodation": "Name, City"}, ...]} where in the plan, there are "day". however, I see that in https://github.com/OSU-NLP-Group/TravelPlanner/blob/main/evaluation/eval.py#L88, the key should be "days" ?
I also see in https://github.com/OSU-NLP-Group/TravelPlanner/blob/main/postprocess/openai_request.py , in the json format, they are "days" instead of "day".

May I know does this affect anything about evaluation ?
please advise. thanks.

Answer 1 · 2024-04-14T14:08:56.000Z

besides, does the leaderboard support submission testing ? I mean just uploading several lines (rather than 1000) to have a format check. Also, does it support shuffled order ?

Answer 2 · 2024-04-14T14:13:48.000Z

Hi Yanan,

It would not affect the evaluation since 'day' or 'days' is only used as an index.

Sorry, we still do not support the partial set test or shuffled order test. Maybe we will support these in the future, but now we have too much todo.

Answer 3 · 2024-04-14T14:19:11.000Z

thanks a lot.