The state-of-the-art methods for online mapping are based on supervised learning and are trained predominantly using two datasets: nuScenes and Argoverse 2. These datasets revisit the same geographic locations across training, validation, and test sets which yields inflated performance numbers being reported.
Specifically, over
In our paper: Localization is All You Evaluate we propose to split the nuScenes and Arogverse 2 datasets by the samples' positions; Geographically Disjoint splits. This repository contains the propsed Near Extrapolation and Far Extrapolation splits, and the code used to generate them.
We also release some examples on how to convert SOTA online mapping methods' Original split pickle files to Geographically Disjoint split pickle files.
You can use the proposed splits to train and evaluate the performance of online mapping methods directly.
The Geographically disjoint splits are defined in txt files (pkl-files are also provided for convinience) under /near_extrapolation_splits
and /far_extrapolation_splits
respectively.
For the nuScenes Near Extrapolation splits there are two versions:
1 - near_extrapolation_splits/nuscenes/samples
: where all samples are used and sequences that straddles a set boundary are split in two parts and assigned to the respective sets (see paper for details). The split-files consist of all indivudual samples' set assignment.
2 - near_extrapolation_splits/nuscenes/scenes
: sequences that straddles a set boundary are removed. The split-files contain the scene-name for each set assignment.
For Far Extrapolation splits the name of the file indicates the city and the set. E.g. singapore.txt
contains the scenes from Singapore and PIT+MIA.txt
contains the log ids for Pittsburgh and Miami.
If you want to verify the Geographically Disjoint splits, you can install the required packages and run the accompanying code as follows:
conda create --name geosplits python=3.8
conda activate geosplits
pip install -r requirements.txt
Download according to the instructions in the respective repositories:
- nuScenes (https://www.nuscenes.org/download)
- Argoverse 2 (https://www.argoverse.org/av2.html#download-link)
Create nuScenes splits:
python src/nuscenes/generate_geo_split.py --data_dir /path/to/nuscenes
Create Argoverse 2 splits:
python src/argoverse2/generate_geo_split.py --data_dir /path/to/argoverse2
Generate the necessary dataset pkls following the instructions in the respective repositories:
- MapTR: https://github.com/hustvl/MapTR/tree/main
- MapTRv2: https://github.com/hustvl/MapTR/tree/maptrv2
- VectorMapNet & HDMapNet: https://github.com/Tsinghua-MARS-Lab/Online_Map_Construction_Benchmark
- More to be added...
Convert the dataset pkl files you generated in the previous step to geographically disjoint split pkls:
python src/nuscenes/convert_pkls.py --method my-selected-method --pkl_dir /path/to/pkls/folder/of/my/selected/method --output_dir /path/to/output
python src/argoverse2/convert_pkls.py --method my-selected-method --pkl_dir /path/to/pkls/folder/of/my/selected/method --output_dir /path/to/output
The '--og_pkl_name' argument can be used to specify the base name of the original pkl files. E.g. the default for nuscenes is 'nuscenes_map_infos_temporal' and then '_train', '_val', '_test' will be appended to the base name to find the original pkl files.
Follow the instructions in the respective repositories for training and evaluation. Simply replace the path to the original pkl files with the geographical split pkls you created above.