/geographical-splits

Splitting nuScenes and Arogverse 2 datasets by the samples' positions; Geographical splits.

Primary LanguagePython

Localization Is All You Evaluate

Data Leakage in Online Mapping Datasets and How to Fix It

CVPR 2024

The state-of-the-art methods for online mapping are based on supervised learning and are trained predominantly using two datasets: nuScenes and Argoverse 2. These datasets revisit the same geographic locations across training, validation, and test sets which yields inflated performance numbers being reported.

Specifically, over $80$% of nuScenes and $40$% of Argoverse 2 validation and test samples are located less than $5$ m from a training sample. The figure below displays an example of this, where three samples from the nuScenes dataset are highlighted. Despite being from different sets, the samples are situated in the same geographic location.

In our paper: Localization is All You Evaluate we propose to split the nuScenes and Arogverse 2 datasets by the samples' positions; Geographically Disjoint splits. This repository contains the propsed Near Extrapolation and Far Extrapolation splits, and the code used to generate them.

We also release some examples on how to convert SOTA online mapping methods' Original split pickle files to Geographically Disjoint split pickle files.

nuScenes Near Extrapolation Splits

Argoverse 2 Near Extrapolation Splits

Usage

You can use the proposed splits to train and evaluate the performance of online mapping methods directly.

The Geographically disjoint splits are defined in txt files (pkl-files are also provided for convinience) under /near_extrapolation_splits and /far_extrapolation_splits respectively.

For the nuScenes Near Extrapolation splits there are two versions:

1 - near_extrapolation_splits/nuscenes/samples: where all samples are used and sequences that straddles a set boundary are split in two parts and assigned to the respective sets (see paper for details). The split-files consist of all indivudual samples' set assignment.

2 - near_extrapolation_splits/nuscenes/scenes: sequences that straddles a set boundary are removed. The split-files contain the scene-name for each set assignment.

For Far Extrapolation splits the name of the file indicates the city and the set. E.g. singapore.txt contains the scenes from Singapore and PIT+MIA.txt contains the log ids for Pittsburgh and Miami.

Create/Verify Geographical Splits and results in paper

If you want to verify the Geographically Disjoint splits, you can install the required packages and run the accompanying code as follows:

Install

conda create --name geosplits python=3.8
conda activate geosplits
pip install -r requirements.txt

Download data

Download according to the instructions in the respective repositories:

Generate Geographical Splits

Create nuScenes splits:

python src/nuscenes/generate_geo_split.py --data_dir /path/to/nuscenes 

Create Argoverse 2 splits:

python src/argoverse2/generate_geo_split.py --data_dir /path/to/argoverse2

Generate original pkl files using the method of your choice

Generate the necessary dataset pkls following the instructions in the respective repositories:

Convert pickle files from a method to geographically disjoint split pkls

Convert the dataset pkl files you generated in the previous step to geographically disjoint split pkls:

python src/nuscenes/convert_pkls.py --method my-selected-method --pkl_dir /path/to/pkls/folder/of/my/selected/method --output_dir /path/to/output 
python src/argoverse2/convert_pkls.py --method my-selected-method --pkl_dir /path/to/pkls/folder/of/my/selected/method --output_dir /path/to/output 

The '--og_pkl_name' argument can be used to specify the base name of the original pkl files. E.g. the default for nuscenes is 'nuscenes_map_infos_temporal' and then '_train', '_val', '_test' will be appended to the base name to find the original pkl files.

Train & Evaluate

Follow the instructions in the respective repositories for training and evaluation. Simply replace the path to the original pkl files with the geographical split pkls you created above.