google-research/pathdreamer

Questions about positions input and equirectangular image

MarSaKi opened this issue · 4 comments

Thanks for your great work! I believe this work could promote the developments of model-based methods for VLN!
May I ask some questions?
1.Predicting a further panorama needs the positions and orientations of the agent, but I note in you demo, only the positions are input to the model. How is it work? The agent implicitly calculates the orientations?
2. How to get the equirectangular image in the Matterport3D Simulator (the simulator is based on a skybox image for each viewpoint). Do you have any scripts?

Thanks for your interest in our work! To answer your questions:

  1. We assume a default orientation (see https://github.com/google-research/pathdreamer/blob/master/models/pathdreamer_models.py#L207-L211)
  2. The official Matterport3D dataset (not the simulator) contains 10,800 high resolution panoramas which we use to train on. Please visit their website for download instructions.

Thanks for your reply, it's very useful. Great work!

I still have a question:
image

In the VLN validation experiment of your paper (Sec.4.2.). How did you encode the path of Pathdreamer model? As far as I know, the visual path encoder of On the Evaluation of Vision-and-Language Navigation Instructions is based on a navigable view from 36 discrete views (Discrete Panorama), which is slightly different from the output of pathdreamer (Continuous Panorama).

Did you use different visual path encoders for 'Pathdreamer' and 'Ground Truth' settings? Or manually discretize the output panorama of Pathdreamer model? Could you please give more details of the VLN validation experiments, thanks!

Cheers!

Hi,

Sorry for the late response.

  1. We sample 36 discrete perspective views from our generated equirectangular image, similar to how R2R does.
  2. We use the same path encoder for both groundtruth and Pathdreamer. Our VLN experiment runs off models from VALAN (but it is likely that you can achieve similar uplift in success rate by using planning for any VLN agent). Unfortunately our code for this part is a bit messy and will likely not be released.

Hope that helps!

Thanks for your reply! It's very useful!

Best Regards!