eric-ai-lab/VLMbench

Evaluation issues

huangjy-pku opened this issue · 1 comments

Hello, I checked your evaluation script cliport_test.py, but was confused about the setting of the beginning phase.

TwoStreamClipLingUNetLatTransporterAgent.act takes in observation and instruction of each step, but directly outputs the action of place (skipping pick). So, I wonder how can this work in the beginning of task evaluation when nothing is being picked. Or more generally, this is a question about how you extend to arbitrary number of steps (as claimed in paper) with two-stage agent (CLIPort). I mean, in other words, only when pick can be ignored (e.g. object already picked), directly applying place may be reasonable.

By the way, I cannot access waypoints information in the small sample dataset. Maybe the waypoints can provide some clues for the above question, but now I have no idea.

Ok, sorry for my carelessness. I just realize the agent you implement is aimed at a single action for only next waypoint, which solves my question above.

Though, there are misleading elements. E.g., query/key stream in repo v.s. key/value module in paper; same attention/transport pipeline as CLIPort while different learning target (semantics of the final output), particularly the different semantics of attention from CLIPort while still using the same attention loss as CLIPort.