OSU-NLP-Group/TravelPlanner

[question] testset evaluation submission

yananchen1989 opened this issue · 3 comments

hi authors.

i see that in https://huggingface.co/spaces/osunlp/TravelPlannerLeaderboard
Format of Submission:
{"idx":0,"query":"Natural Language Query","plan":[{"day": 1, "current_city": "from [City A] to [City B]", "transportation": "Flight Number: XXX, from A to B", "breakfast": "Name, City", "attraction": "Name, City;Name, City;...;Name, City;", "lunch": "Name, City", "dinner": "Name, City", "accommodation": "Name, City"}, {"day": 2, "current_city": "City B", "transportation": "-", "breakfast": "Name, City", "attraction": "Name, City;Name, City;", "lunch": "Name, City", "dinner": "Name, City", "accommodation": "Name, City"}, ...]} where in the plan, there are "day". however, I see that in https://github.com/OSU-NLP-Group/TravelPlanner/blob/main/evaluation/eval.py#L88, the key should be "days" ?
I also see in https://github.com/OSU-NLP-Group/TravelPlanner/blob/main/postprocess/openai_request.py , in the json format, they are "days" instead of "day".

May I know does this affect anything about evaluation ?
please advise. thanks.

besides, does the leaderboard support submission testing ? I mean just uploading several lines (rather than 1000) to have a format check. Also, does it support shuffled order ?

Hi Yanan,

It would not affect the evaluation since 'day' or 'days' is only used as an index.

Sorry, we still do not support the partial set test or shuffled order test. Maybe we will support these in the future, but now we have too much todo.

thanks a lot.