Cross-lingual Vision-Language Navigation

We introduce a new dataset for Cross-Lingual Vision-Language Navigation.

Cross-lingual Room-to-Room (XL-R2R) Dataset

The XL-R2R dataset is built upon the R2R dataset and extends it with Chinese instructions. XL-R2R preserves the same splits as in R2R and thus consists of train, val-seen, and val-unseen splits with both English and Chinese instructions, and test split with English instructions only.

Data is formatted as follows:

{
  "distance": float,
  "scan": str,
  "path_id": int,
  "path": [str x num_steps],
  "heading": float,
  "instructions": [str x 3],
}

distance: length of the path in meters.
scan: Matterport scan id.
path_id: Unique id for this path.
path: List of viewpoint ids (the first is is the start location, the last is the goal location)
heading: Agents initial heading in radians (elevation is always assumed to be zero).
instructions: Three unique natural language strings describing how to find the goal given the start pose.

For the test set, only the first path_id (starting location) is included (a test server is hosted by Anderson et al. for scoring uploaded trajectories).

zzxslp/XL-VLN

Cross-lingual Vision-Language Navigation

Cross-lingual Room-to-Room (XL-R2R) Dataset