Using these scripts you can download a number of visual geolocalization datasets. The datasets are downloaded and formatted using a standard format, suitable to be used by our soon-to-be open source benchmarking software, and their maps are automatically created. You can also find more information about our project in the dedicated website.
About the datasets formatting, the adopted convention is that the names of the files with the images are:
@ UTM_easting @ UTM_northing @ UTM_zone_number @ UTM_zone_letter @ latitude @ longitude @ pano_id @ tile_num @ heading @ pitch @ roll @ height @ timestamp @ note @ extension
Note that for many datasets some of these values are empty, and however the only required values are UTM coordinates (obtained from latitude and longitude).
The reason for using the character "@" as a separator, is that commonly used characters such as dash "-" or underscore "_" might be used in the fields, for example in the pano_id field.
The directory tree that is generated is as follows:
.
└── datasets
└── dataset_name
└── images
├── train
│ ├── database
│ └── queries
├── val
│ ├── database
│ └── queries
└── test
├── database
└── queries
For training throughout our benchmark we used Pitts and MSLS as dataset, and the others, listed below, only as test set to evaluate the generalization capability of the models. This is for many reasons, like the absence of a time machine that is necessary to train robust models.
The list of datasets that you can download with this code is the following:
- Pitts30k*
- Pitts250k*
- Mapillary SLS**
- Eysham - as test set only
- San Francisco - as test set only
- Tokyo 24/7* - as test set only
- St Lucia - as test set only
- SVOX - as test set only
To download each dataset, simply run the corresponding python script, that will download, unpack and format the file according to the structure above.
*: for Pitts30k, Pitts250k and Tokyo 24/7 the images should be downloaded by asking permission to the respective authors. Then they can be formatted with this codebase
**: for Mapillary SLS, you need to first log in into their website, download it here,
then extract the zip files and run
$ python format_mapillary.py
Pitts30k
For Pitts30k, first download the data under datasets/pitts30k/raw_data, then simply run $ python format_pitts30k.py
Pitts250k
For Pitts250k, first download the data under datasets/pitts250k/raw_data, then simply run $ python format_pitts250k.py
Mapillary SLS
For Mapillary SLS, you need to first log in into their website, download it here,
then extract the zip files, and place it in a folder datasets
inside the repository root and name it
mapillary_sls
.
Then you can run:
$ python format_mapillary.py
Eynsham
To download Eynsham, simply run $ python download_eynsham.py
San Francisco
To download San Francisco, simply run $ python download_san_francisco.py
St Lucia
To download St Lucia, simply run $ python download_st_lucia.py
SVOX
To download SVOX, simply run $ python download_svox.py
Tokyo 24/7
For Tokyo 24/7, first download the data under datasets/tokyo247/raw_data, then simply run $ python format_tokyo247.py
. Queries are automatically downloaded.