j96w/MimicPlay

Question about goal image selection

tianhaowuhz opened this issue · 0 comments

When using the video prompt for inferencing, as shown in the code, the algo will find the nearest index according to ee pose to determine the next goal. However, if I understand correctly, the nearest index is computed from "goal_ee_traj", which seems to be the "robot ee pos". So if I only have human goal images, how do I determine the next goal image?