We introduce a new dataset for Cross-Lingual Vision-Language Navigation.
The XL-R2R dataset is built upon the R2R dataset and extends it with Chinese instructions.
XL-R2R preserves the same splits as in R2R and thus consists of train
, val-seen
, and val-unseen
splits with both English and Chinese instructions, and test
split with English instructions only.
Data is formatted as follows:
{
"distance": float,
"scan": str,
"path_id": int,
"path": [str x num_steps],
"heading": float,
"instructions": [str x 3],
}
distance
: length of the path in meters.scan
: Matterport scan id.path_id
: Unique id for this path.path
: List of viewpoint ids (the first is is the start location, the last is the goal location)heading
: Agents initial heading in radians (elevation is always assumed to be zero).instructions
: Three unique natural language strings describing how to find the goal given the start pose.
For the test set, only the first path_id (starting location) is included (a test server is hosted by Anderson et al. for scoring uploaded trajectories).