Major concern about evaluation

Question

Major concern about evaluation

ezhang7423 opened this issue 2 years ago · 3 comments

Hi there!
I've found that rolling out ground truth trajectories (labelled by the language annotator) from the dataset is not always evaluated to be successful by the Tasks.get_task_info. This seems to be quite concerning. Perhaps I've done something wrong on my end?

Answer 1 · 2022-12-13T10:26:00.000Z

Could you share which code you ran exactly?

Answer 2 · 2022-12-13T10:30:51.000Z

And could I ask you to move (i.e. reopen) this issue to the calvin repo?

Answer 3 · 2022-12-18T14:58:13.000Z

Sorry for the late reply! I've attached info to reproduce in this issue: mees/calvin#32