How to get the image pairs for localization of Aachen dataset?

Question

How to get the image pairs for localization of Aachen dataset?

Closed this issue 4 years ago · 9 comments

Hi @mihaidusmanu, thanks so much for releasing the codes and implementation as well as evaluation metrics, and supports to make this repository very complete. I find the performance and the evaluation metrics of D2-Net quite interesting, especially on the localization with Aachen Day-Night datasets (https://data.ciirc.cvut.cz/public/projects/2020VisualLocalization/Aachen-Day-Night/).

D2-Net performance is very outstanding, and I have been interesting about the algorithm for a while. Here, I would like to reproduce the results using the visual localization benchmark . I am actually trying to select the good algorithm to perform the task (I will cite your paper accordingly).

My questions are about how to get the retrieval_list.txt and database_pairs_list.txt.

Did you get retrieval_list.txt from using NetVlad ? Specifically, did you extract the pair of the query image vs database image, following the demoRetrieval.m ? Did you retrieve the images using only the NetVlad features?
How did you get database_pairs_list.txt ? Did you get it by exhaustive matching between day images ?
I also notice that there is image_pairs_to_match.txt which contains the image pairs from Aachen dataset at visual localization benchmark's data preparation . Are the pairs between database and queries from this file to create the retrieval_list.txt ? And, can the pairs between database images be used for database_pairs_list.txt ?
Lastly, I am not sure if it is too much to ask. It would be great if you could release the text file for retrieval_list.txt and database_pairs_list.txt ?

Answer 1 · 2020-05-04T08:35:37.000Z

Please find my answers below:

For the query images, we have used NetVLAD to retrieve top K database images. More precisely, I used https://github.com/Relja/netvlad/blob/master/serialAllFeats.m to compute 4096 dimensional features for both queries and database images and simply picked the closest K database descriptors for each query image.
No, I do not use exhaustive matching. The dataset already provides poses for database images so I picked top 20 spatially nearby images for each database images - thus the number of database image pairs to match is around 20 x number of database images (supposing no overlap) which is significantly lower than exhaustive matching.
The files released at https://github.com/tsattler/visuallocalizationbenchmark/tree/master/local_feature_evaluation are only for the Local Features challenge (i.e. Aachen Night only). Do no use them for the other tracks since these pairs were manually retrieved.
Please find below the database to database pairs as well as query to database top 20 NetVLAD retrieved pairs.
query_to_database_pairs_to_match_20.txt
database_pairs_to_match.txt

Answer 2 · 2020-05-04T15:05:14.000Z

Thanks so much for the information. I have a few questions left:

In step 3 building 3D model , I wonder about how to initialize the sparse recon model as the input to colmap point_triangulator with camera parameters ?
Also, just to confirm if I should use the provided database.db from the visuallocalizationbenchmark as the db.db which is the input of modify_database_with_custom_features_and_matches.py ? It is a bit confusing for me because the Aachen day-night dataset also provides its own database (aachen.db https://data.ciirc.cvut.cz/public/projects/2020VisualLocalization/Aachen-Day-Night/).
How did you initialize the input database db.db ? Is db.db initialized by "colmap feature_extractor" and the camera parameters are given by database_intrinsics.txt from https://data.ciirc.cvut.cz/public/projects/2020VisualLocalization/Aachen-Day-Night/3D-models/ ? How did you get the camera intrinsics parameters for night images (both /milestone and /nexus) ?
Is the input database db.db to *modify_database_with_custom_features_and_matches.py in the step 2 the same one as the one in step 4 ( ref. visuallocalizationbenchmark ) ? Does this means that the results from Step 4. will be overwriting the one from Step 2. ?

Answer 3 · 2020-05-05T18:31:05.000Z

Here are my answers:

You need to provide an input empty reconstruction (containing points3D.txt - can be empty, cameras.txt, images.txt - does not need keypoints) to point_triangulator with the camera intrinsics and extrinsics. Please refer to the implementation at https://github.com/tsattler/visuallocalizationbenchmark/blob/master/local_feature_evaluation/reconstruction_pipeline.py for more details on how to create this reconstruction, notably the function generate_empty_reconstruction.
The database that comes with Aachen Day-Night at https://data.ciirc.cvut.cz/public/projects/2020VisualLocalization/Aachen-Day-Night/ contains SIFT features. Unless you plan on using them, there's no need to download it. The database available on the visuallocalizationbenchmark repository is an empty database containing only images and cameras. Please check that the camera parameters in the database are the same as in the reference reconstruction - if this is not the case, let me know.
The database db.db is manually built using the intrinsics from the reference reconstruction. The intrinsics for the query images are given at https://data.ciirc.cvut.cz/public/projects/2020VisualLocalization/Aachen-Day-Night/queries/. If you want to rebuild it, you will need to write a script that manually inserts cameras and images into a database using SQL statements.
Step 2 does database to database image matching while step 4 does query to database matching. Thus the two steps complement each other.

Answer 4 · 2020-05-06T07:45:57.000Z

Thank you so much for the answers. It is much clearer, now.

However, I am still be having the problem related to the database file (db.db) for the localization evaluation as the prerequisite said (A COLMAP database (e.g., db.db) containing the database and query images as well as their intrinsics. ) . The problem is that database.db does not contain the intrinsic parameters for image in query/night/milestone

First, I would like to confirm that the given database.db seem to contain similar intrinsic parameters as in the text files in /queries folder as well as the database intrinsic.

However, the problem is that both the night_time_queries_with_intrinsics.txt and the database.db does not contain the intrinsic parameters for image in query/night/milestone .... Does it means that the query images of this query/night/milestone are not used in evaluation of your paper? Or, could you please advice on how to get the intrinsic parameters for these images ?

In related to previous question, as I was trying to extract those intrinsic parameters, I have checked the aachen.db from Aachen-Day-Night datasets against the database.db provided by visuallocalizationbenchmark. It seems that aachen.db would share the same camera id (and thus, intrinsic parameters) for the query images from the same subdirectory/camera settings. For example, the query images from image_upright/queries/night/nexus5x will be using the camera id of 4480, image_upright/queries/night/milestons will be using the camera id of 5286 which is the same as the query images from image_upright/queries/day/milestons. However, from https://data.ciirc.cvut.cz/public/projects/2020VisualLocalization/Aachen-Day-Night/queries/day_time_queries_with_intrinsics.txt, it seems that the camera_id is not shared between images: One camera_id is for each query image. So, it would be great if you could help explain how to extract those intrinsic parameters ....

Answer 5 · 2020-05-06T08:13:26.000Z

Only cameras that have intrinsics at https://data.ciirc.cvut.cz/public/projects/2020VisualLocalization/Aachen-Day-Night/queries/ are used for evaluation.

As for the second question, queries taken with the same camera and orientation (i.e. landscape / portrait) have the same intrinsics. You can check this in the lists. For instance:

query/day/nexus4/IMG_20130210_165452.jpg SIMPLE_RADIAL 1600 1200 1469.2 800 600 -0.0353019
query/day/nexus4/IMG_20140521_134213.jpg SIMPLE_RADIAL 1600 1200 1469.2 800 600 -0.0353019
query/day/nexus4/IMG_20130210_164513.jpg SIMPLE_RADIAL 1200 1600 1458.14 600 800 -0.0302454
query/day/nexus4/IMG_20130210_164534.jpg SIMPLE_RADIAL 1200 1600 1458.14 600 800 -0.0302454

1600x1200 have a set of intrinsics, 1200x1600 have another one.

Answer 6 · 2020-05-07T02:49:45.000Z

Hi @mihaidusmanu , I see. Thank you very much. It is all cleared now.

Actually, I have one last question. From my understanding, the reference pose for the query images is withheld, which I respect this decision (ref. https://www.visuallocalization.net).

However, I only wonder if there is a way to quickly test whether or not the developed local features are working well on localization with this dataset (Aachen-day-night), Or is there any other equivalent or just a standard dataset will allow developer to access the reference pose?

Answer 7 · 2020-05-07T07:37:31.000Z

If you simply want to test your predictions, you can submit directly to the evaluation service available at https://www.visuallocalization.net/ and keep the submission hidden. Otherwise, if you need to have access to poses, you can use a validation split (i.e. randomly sample some database images as "query" and evaluate on these while developing your code).

Answer 8 · 2020-05-07T08:12:39.000Z

I see. Thanks so much for your answers, insight, and the great supports. I will keep following your works and future contributions.

PS. The idea of sampling the database images as query is quit interesting. I will try :) .

Answer 9 · 2021-05-28T07:58:38.000Z

Please find my answers below:

For the query images, we have used NetVLAD to retrieve top K database images. More precisely, I used https://github.com/Relja/netvlad/blob/master/serialAllFeats.m to compute 4096 dimensional features for both queries and database images and simply picked the closest K database descriptors for each query image.

No, I do not use exhaustive matching. The dataset already provides poses for database images so I picked top 20 spatially nearby images for each database images - thus the number of database image pairs to match is around 20 x number of database images (supposing no overlap) which is significantly lower than exhaustive matching.

The files released at https://github.com/tsattler/visuallocalizationbenchmark/tree/master/local_feature_evaluation are only for the Local Features challenge (i.e. Aachen Night only). Do no use them for the other tracks since these pairs were manually retrieved.

Please find below the database to database pairs as well as query to database top 20 NetVLAD retrieved pairs.
query_to_database_pairs_to_match_20.txt
database_pairs_to_match.txt

Hi, for point 2, how do you get the 20 ''spatially'' nearby images for each database images?
What information do you use?