yding25/GPT-Planner

Questions on Your Dataset

Closed this issue · 1 comments

Thank you for publishing such an amazing work.

We noticed that your COWP dataset only includes textual descriptions of the tasks and possible situations. We are wondering if you used a vision system only in your robot demonstration to detect situations, but not in any of the benchmarks and result histograms of 12 different tasks mentioned in the paper? Thank you!

Thank you for appreciating our work.

The COWP does not utilize a vision system; instead, it operates within the realm of natural language. We assume a perfect Visual Question Answering (VQA) model capable of accurately describing scenes, with these descriptions serving as the situation context.

We actually attempted to develop a vision system. However, unfortunately, few robotics simulation platforms support visualizing various scenarios, such as coffee spills or broken cups, and it also demands significant human effort.