OSU-NLP-Group/TravelPlanner

Mismatch features in osunlp/TravelPlanner Dataset

Closed this issue · 3 comments

Train dataset features:

  • ['org', 'dest', 'days', 'visiting_city_number', 'date', 'people_number', 'local_constraint', 'budget', 'query', 'level', 'annotated_plan', 'reference_information']

Validation dataset features:

  • ['org', 'dest', 'days', 'visiting_city_number', 'date', 'people_number', 'local_constraint', 'budget', 'query', 'level', 'reference_information']

Test dataset features:

  • ['days', 'level', 'query', 'reference_information']

The features are different among different datasets. There are missing features in the test dataset, so the greedy search codes cannot be run directly.

Hi,

Thank you for your interest in our work. We have updated the test set in the huggingface and related code. The greedy search code works now. Just fetch the newest code and try it again!

We assigned different features among different datasets due to the different uses of datasets. Some features are hidden to avoid data contamination.

Feel free to contact us if you have further questions.

Best,
Jian

Hello,

Thank you for a quick update. I noticed the features of the test dataset now is

  • ['org', 'dest', 'days', 'date', 'query', 'level', 'reference_information']

It seems that

  • ['visiting_city_number', 'date', 'people_number', 'local_constraint', 'budget']

are missing. Do you know if it is on purpose? I understand the evaluation codes will be uploaded later, but I found that evaluation/eval.py Line 105 uses 'local_constraint' as a key to fetch the information for evaluation. Thus, can I assume this information should be included in the final evaluation codes?

Hi @lzl65825 ,

We have updated the evaluation code just now. :)

We do conceal these features on purpose since we hope to maintain a fair evaluation process without any data contamination and potential cheating on test set. So, we support the offline evaluation of the validation set and provide the online evaluation of the test set on our leaderboard .