OSU-NLP-Group/TravelPlanner

commonsense_constraint error

yananchen1989 opened this issue · 2 comments

hi team.

As we see in the code, temperature is zero, but you know, sometimes the response from llm is not deterministic.
Therefore, sometimes, though not that often, LLM does not generate a valid plan.

for example, in the number 48 example in validation set, query:

I require a travel itinerary for a seven-day trip beginning on March 2nd and ending on March 8th, 2022. The trip will begin in Philadelphia and involve visiting 3 cities in Virginia. The available budget for the trip is $2,900.

most of time, it generates a valid plan, like this below, which can be parsed into json format for further eval.

Day 1:
Current City: Philadelphia
Transportation: Self-driving from Philadelphia to Richmond

  • Duration: 4 hours 2 mins
  • Distance: 407 km
  • Cost: $20
    Accommodation: Not applicable

Day 2:
Current City: Richmond
Transportation: Self-driving from Richmond to Jamestown

  • Duration: 1 hour 1 min
  • Distance: 92.4 km
  • Cost: $4
    Accommodation: Not applicable

Day 3:
Current City: Jamestown
Attraction: Historic Jamestowne

  • Address: 1368 Colonial Nat'l Historical Pkwy, Jamestown, VA 23081, USA
  • Phone: (757) 856-1250
  • Website: http://www.historicjamestowne.org/
    Lunch: Beyond Breads
  • Average Cost: $95
    Dinner: Mama's Fish House
  • Average Cost: $63
    Accommodation: Not applicable

Day 4:
Current City: Jamestown
Transportation: Self-driving from Jamestown to Charlottesville

  • Duration: 2 hours 2 mins
  • Distance: 206 km
  • Cost: $10
    Accommodation: Not applicable

Day 5:
Current City: Charlottesville
Attraction: Monticello

  • Address: 1050 Monticello Loop, Charlottesville, VA 22902, USA
  • Phone: (434) 984-9800
  • Website: https://www.monticello.org/
    Lunch: Mama's Fish House
  • Average Cost: $63
    Dinner: Restaurant Andre
  • Average Cost: $250
    Accommodation: Not applicable

Day 6:
Current City: Charlottesville
Transportation: Self-driving from Charlottesville to Philadelphia

  • Duration: 4 hours 24 mins
  • Distance: 411 km
  • Cost: $20
    Accommodation: Not applicable

Day 7:
Current City: Philadelphia
Accommodation: Not applicable

However, a few times, under the exactly same prompt, it generates

Apologies, but it seems that the budget provided is insufficient to cover the travel expenses for a seven-day trip involving multiple cities. If you could provide a higher budget, I
would be more than happy to assist you in creating a detailed travel itinerary.

and parsed by LLM, it will be json: {"error": "Insufficient budget provided"}

so for this case, it will cause error in eval.py

File "C:\Users\ITSupp\Downloads\codes\TravelPlanner\tools\planner\sole_planning.py", line 169, in
scores, detailed_scores = eval_score(args.set_type, tested_plans)
File "C:\Users/ITSupp/Downloads/codes/TravelPlanner/evaluation\eval.py", line 80, in eval_score
commonsense_info_box = commonsense_eval(query_data,tested_plan['plan'])
File "C:\Users/ITSupp/Downloads/codes/TravelPlanner/evaluation\commonsense_constraint.py", line 523, in evaluation
return_info['is_reasonalbe_visiting_city'] = is_reasonalbe_visiting_city(query_data, tested_data)
File "C:\Users/ITSupp/Downloads/codes/TravelPlanner/evaluation\commonsense_constraint.py", line 134, in is_reasonalbe_visiting_city
city_value = tested_data[i]['current_city']

is there any suggestion to not trigger this error and maybe for this type of cases, the eval system directly count them as 0 delivery ?
thanks.

Hi,

This issue also exists in our experiments. However, the best solution we've come up with is to manually revise these instances. Of course, directly counting them as zero is a simpler method. However, identifying these exceptions poses a challenge since the exceptions arising during parsing by LLMs vary greatly. Given the rigorous nature of this research work, we cannot afford to assign a score of zero every time an exception occurs, especially considering the variety of other exceptions that might arise.

@hsaest very helpful. make sense. thanks.