ubc-vision/COTR

Matching time

Closed this issue · 19 comments

您好,感谢您精彩的工作。有点疑问向您请教,请问该如何理解一个点的查询,每秒可以做到35个对应点?
"Our currently non-optimized prototype implementation queries one point at a time, and achieves 35 correspondences per second on a NVIDIA RTX 3090 GPU. "
我最近在跑您的代码,我在NVIDIA RTX 3090 GPU跑demo_single_pair.py,匹配大概花了30s,请问这正常吗?
谢谢!

您好,demo_single_pair.py耗时较长有如下几个原因:

  1. 在最开始的时候,COTR需要预测两张图之间的scale,这个过程会把所有的点(meshgrid)在256x256分辨率输进去,会比较慢。具体的时间取决于使用了stretching还是tile模式。(tile模式比stretching慢4倍左右)
  2. 这个demo使用了cycle consistency check,所以会跑大概两倍多的correspondences,具体的数目要看“EXTRACTION_RATE”这个超参。

谢谢!

您好,demo_single_pair.py耗时较长有如下几个原因:

  1. 在最开始的时候,COTR需要预测两张图之间的scale,这个过程会把所有的点(meshgrid)在256x256分辨率输进去,会比较慢。具体的时间取决于使用了stretching还是tile模式。(tile模式比stretching慢4倍左右)
  2. 这个demo使用了cycle consistency check,所以会跑大概两倍多的correspondences,具体的数目要看“EXTRACTION_RATE”这个超参。

谢谢!

谢谢您的解答

@jiangwei221 您好 请问有什么提速的方法吗吗?
1.tile模式改为stretching?这个影响match效果的程度有多大呢?
2.去掉cycle consistency check? 在demo_single_pair.py脚本里直接使用engine.cotr_corr_multiscale函数?
3.图像resize到1:1的长宽比?(这样to_square_patches函数后只有一个patch,然后match的时间减少了)
4.cotr_corr_multiscale_with_cycle_consistency/cotr_corr_multiscale函数的zoom_ins参数设置稀疏一点?,例如:只设置np.linspace(0.5, 0.0625, 2)

此外,还有别的提速方法吗?

@jiangwei221 谢谢回复 我试试。

Hello, thank you for such an excellent job. I want to ask during inference, the speed difference between a single inference using the validation set and a single inference using the demo is ten times, I want to ask why?

I'm not sure what command did you use? Maybe caused by --faster_infer=yes?

I am referring to the speed of validating once using the validation set while training and the rate of reasoning on a pair of pictures and queries using the demo alone. faster_infer I've turned on.
Validation set for inference refers to

def validate(self):

I see. The validate function only works on 256x256 patches with few query points, without recursive zoom-ins, so it is much faster than the demo.

The default value of recursive zoom-ins during training is false. It seems to me that there is no interface to control recursive zoumins in demo. I input two minimum size pictures with the same size by myself. Can I control it not to iterate recursively in demo?

You can consider using:

pred = self.model(img, query)['pred_corrs']

You can consider using:

pred = self.model(img, query)['pred_corrs']

Ok, I will try. Thank u for ur patient reply

Hello, I can't download megadepth data set for the moment. Could you please tell me the tree distribution format of this data set, and whether the data of one scene should be put together or all scenes should be packaged together? Should pictures a and b be stored separately or together?
In addition, I am confused about the role of the following function. What does knn_engine stand for?

knn_engine = knn_search.ReprojRatioKnnSearch(scene)
def get_query_with_knn(self, index):
scene_index, cap_index = self.get_scene_cap_index_by_index(index)
query_cap = self.scenes[scene_index].captures[cap_index]
knn_engine = self.knn_engine_list[scene_index]
if scene_index in self.scene_index_to_db_caps_mask_dict:
db_mask = self.scene_index_to_db_caps_mask_dict[scene_index]
else:
db_mask = None
pool = knn_engine.get_knn(query_cap, self.opt.pool_size, db_mask=db_mask)
nn_caps = random.sample(pool, min(len(pool), self.opt.k_size))
return query_cap, nn_caps

And what does total cap and query cap do in the following code?
self.total_caps_set = total_caps_set
self.query_caps_set = self._get_common_subset_caps_from_json(dataset_config[self.opt.dataset_name][f'{self.dataset_type}_json'], total_caps_set)
self.db_caps_set = self._get_common_subset_caps_from_json(dataset_config[self.opt.dataset_name]['train_json'], total_caps_set)
self.scene_index_to_db_caps_mask_dict = self._create_scene_index_to_db_caps_mask_dict(self.db_caps_set)

Your code is so well written and modular that I think it's a bit difficult to use the inference process during training.
0074AB3F

  1. What do you mean by "tree distribution format"? Do you mean the folder structure of the dataset?
  2. I don't understand the second question on "whether the data of one scene should be put together or all scenes should be packaged together"
  3. Picture a and b are two different files, you can give a and b different file paths.
  4. The KNN engine is used for retrieving images with common visibility. We used ground truth depth and camera pose to estimate the co-visibility between two images.
  5. total_cap_set are all captures in the dataset. query_caps_set are a subset that belongs to the training/valid/test, and db_caps_set is always a subset that belongs to the training set. This is a visual localization setup, where you have queries, and a database of posed images, see HLoc. Here, we always use the training split as the database of posed images, and switch the query set for training or validation.
  6. Yeah, I think it requires some extra work to adapt the complete inference pipeline into the training pipeline while preserving the differentialbility. You probably also want to check ECO-TR https://github.com/dltan7/ECO-TR, where the model is fully differentiable with every zoom-in embedded into the architecture.

yes, the folder structure of the dataset

Since megadepth data set cannot be downloaded, I cannot know the specific format of this data set. Could you please kindly tell me whether num_queries refers to the specific number of queries in each scene pair or how many scene pairs there are in total, that is, how many pairs of queries? Also do you shuffle the data when reading with Dataloader for randomness? I'm a little confused about how index is generated.

The folder structure is like:

(base) jw221@sushi:MegaDepth_v1$ pwd
/scratch/dataset/megadepth/MegaDepth_v1/phoenix/S6/zl548/MegaDepth_v1
(base) jw221@sushi:MegaDepth_v1$ ls
./     0002/  0007/  0012/  0017/  0022/  0026/  0034/	0039/  0044/  0049/  0058/  0063/  0070/  0080/  0090/	0098/  0102/  0107/  0122/  0137/  0148/  0156/  0175/	0181/  0189/  0204/	0212/  0224/  0237/  0252/  0269/  0281/  0294/  0307/	0327/  0348/  0377/  0394/  0411/  0446/  0476/  0494/	0733/  1017/  5000/  5004/  5008/  5012/  5016/
../    0003/  0008/  0013/  0019/  0023/  0027/  0035/	0041/  0046/  0050/  0060/  0064/  0071/  0083/  0092/	0099/  0103/  0115/  0129/  0141/  0149/  0160/  0176/	0183/  0190/  0204.zip	0214/  0229/  0238/  0257/  0271/  0285/  0299/  0312/	0331/  0349/  0380/  0402/  0412/  0455/  0478/  0496/	0768/  1589/  5001/  5005/  5009/  5013/  5017/
0000/  0004/  0010/  0015/  0020/  0024/  0032/  0036/	0042/  0047/  0056/  0061/  0065/  0076/  0086/  0094/	0100/  0104/  0117/  0130/  0143/  0150/  0162/  0177/	0185/  0197/  0205/	0217/  0231/  0240/  0258/  0275/  0286/  0303/  0323/	0335/  0360/  0387/  0406/  0430/  0472/  0482/  0505/	0860/  3346/  5002/  5006/  5010/  5014/  5018/
0001/  0005/  0011/  0016/  0021/  0025/  0033/  0037/	0043/  0048/  0057/  0062/  0067/  0078/  0087/  0095/	0101/  0105/  0121/  0133/  0147/  0151/  0168/  0178/	0186/  0200/  0209/	0223/  0235/  0243/  0265/  0277/  0290/  0306/  0326/	0341/  0366/  0389/  0407/  0443/  0474/  0493/  0559/	1001/  4541/  5003/  5007/  5011/  5015/
(base) jw221@sushi:MegaDepth_v1$ cd 0002/
(base) jw221@sushi:0002$ ls
./  ../  dense0/
(base) jw221@sushi:0002$  cd dense0/
(base) jw221@sushi:dense0$ ls
./  ../  depths/  dist_mat/  imgs/
(base) jw221@sushi:dense0$

num_queries is actually the number of queries to the transformer decoder.
The dataset is quite large, I'd say the number of training pairs is at 10K~100K level.
Yes, we do random sampling.

Thanks!