Training speed and results

Question

Training speed and results

litingsjj opened this issue 3 years ago · 29 comments

Sorry to bother you, I have two question about this project. for training work, I use multi gpus, the speed about 1.79it/s in first two epoch, it takes one day. After that, the speed is so fast, I don't know why this happened. Also the result, the detections repeatability is more better than rpautrat/Superpoint's, but the descriptors: hpatches-i is 0.90. hpatches-v is 0.55. Compare with rpautrat/Superpoint's, the result is not good.

Answer 1 · 2021-11-17T11:24:59.000Z

Thanks for your attention. I have no idea about the speed. It is similar to my training process. I guess the detector loss causes this phenomenon

The default hyper-parameters may not work well, you have to adjust them several times to get better results. The latest version adds two important variables, i.e., positive_dist, negative_dist, to help you fine tune the model. The parameters lambda_d, lambda_loss, positive_margin and negative_margin are the most decisive ones. You need to adjust them to make positive_dist and negative_dist as small as possible. (The weight i released in the latest version can achieve 0.66 acc. on hpatches_v)

Answer 2 · 2021-11-18T01:38:10.000Z

Thx much with your reply! I will try finetune these parameters! 0.66 acc for hpatches-v is your best result? Also, is the result use pretrain model(superpoint_bn.pth)?

Answer 3 · 2021-11-18T02:50:56.000Z

Thx much with your reply! I will try finetune these parameters! 0.66 acc for hpatches-v is your best result? Also, is the result use pretrain model(superpoint_bn.pth)?

I trained the model without any pre-trained model, and it cost me several days to achieve this performance. Training the model really needs experience and tricks. And i'm failed to find a better hyper-parameters that can directly obtain a good model. 0.66 may not be the best, but this is the best model i can get right now.

Answer 4 · 2021-11-18T02:56:55.000Z

Got it! thx again!

Answer 5 · 2021-11-19T01:28:55.000Z

my result never reach 0.60 for hpatches-v when try several hyper-parameters, can you share hyper-parameters with superpoint_bn.pth? because I inference this model, the result hpatches-v is 0.66. Also if you get better result that can share the experience. I will so appreciate!

Answer 6 · 2021-11-19T02:01:48.000Z

If you want to run with superpoint_bn.pth, remember to set eps=1e-3 for all BatchNorm2d in backbone.py and cnn_head.py. Moreover, another parameter, i.e., momentum , may also work for superpoint_bn.pth (set momentum=0.01), you can try.

Answer 7 · 2021-11-19T02:16:28.000Z

Actually, I get 0.66 result when set eps=1e-3 and momentum=0.1 for inference. But when I training without any pre-trained model, how to set those hyper-parameters to get this result? I try to set eps=1e-3, lambda_d=250 and lambda_loss=10 or other val, fix lr... It can't get this result

Answer 8 · 2021-11-19T03:24:53.000Z

momentum=0.01 and eps=1e-3 are for BatchNorm2d, and only work for superpoint_bn.pth, which is converted from rpautrat's superpoint.
If you want to train your model using this pytorch scripts, i suggest you remove the parameters eps and momentum in BatchNorm2d
sp_0.pth is the model i trained without any pretrained weight.
As far as i know, this pytorch version is very sensitive to the parameters lambda_d and lambda_loss. A bigger lambda_d will make the model more stable，however，the performance is not good.

Answer 9 · 2021-11-19T06:39:05.000Z

Thanks! Ihave a last question about README.md -> Steps. Is that means training model need two stage? The firsts stage need 2.Commet the following lines in loss.py and Set base_lr=0.01 , than use trained models train stage two: base_lr=0.001 and fix hyper-parameters:lambda_d = 250 #balance positive_dist and negtive_dist lambda_loss = 10 #balance detector loss and descriptor loss
The 7.Start training again.(lambda_d and lambda_loss may need to be adjusted several times). is means fix hyper-parameters several times to train model?(maybe stage three, stage four?)

Answer 10 · 2021-11-19T06:41:35.000Z

If I use your last version, should I train how many times?

Answer 11 · 2021-11-19T06:56:51.000Z

Also, converted from rpautrat's superpoint didn't have same result, Is possible have bug like this: rpautrat/SuperPoint#117
eric-yyjau/pytorch-superpoint#24

Answer 12 · 2021-11-19T08:49:59.000Z

Set eps=1e-3 for BN will achieve 0.66 acc. on hpatches-v. I'm not sure if you can achieve similar performances by setting eps=1e-3, momentum=0.01. Because the default parameters of BN between tf and torch are different, eps and momentum are two key parameters. Moreover, the conv2d in tf and pytorch are also slightly different. So we can only get similar performances as the rpautrat's sp.
I'm not sure how many times you need to train. Right now, i haven't got an effective training process. However, i strongly suggest you reading the descriptor_loss functions in loss.py and debugging some of the key variables like dot_product_desc, positive_dist, negtive_dist. I think this will greatly help you adjust the hyper-parameters

Answer 13 · 2021-11-22T06:49:43.000Z

Thanks for your answer!

Answer 14 · 2021-11-28T02:37:52.000Z

Hi litingsjj, I found that the parameters for photometric are different from rpautrat's SuperPoint. This may affect the training performance.
I have update superpoint_coco_train.yaml according to rpautrat's SuperPoint.

random_brightness: {max_abs_change: 0.2}
random_contrast: {strength_range: [0.5, 1.5]}
additive_gaussian_noise: {stddev_range: [0, 10]}
additive_speckle_noise: {prob_range: [0, 0.0035]}
additive_shade:
    transparency_range: [-0.5, 0.5]
    kernel_size_range: [100, 150]
    nb_ellipses: 20
motion_blur: {max_kernel_size: 3}

And,i'm checking to see if there are any other problems.

Answer 15 · 2021-11-29T06:57:56.000Z

great! what's the results when you update the *.yaml? Also, I can't reproduce your result for now

Answer 16 · 2021-11-29T07:09:19.000Z

Maybe my trained magic model is different with yours, so export coco dataset make affect performance. can you provide your magic model?
last week , i use your project, the descriptor loss have some comment, like：
` ## better comment this at the begining of training
#dot_product_desc = F.relu(dot_product_desc)

##Normalization scores, better comment this at the begining of training

dot_product_desc = torch.reshape(F.normalize(torch.reshape(dot_product_desc, [batch_size, Hc, Wc, Hc * Wc]), p=2,dim=3), [batch_size, Hc, Wc, Hc, Wc])

dot_product_desc = torch.reshape(F.normalize(torch.reshape(dot_product_desc, [batch_size, Hc * Wc, Hc, Wc]), p=2,dim=1), [batch_size, Hc, Wc, Hc, Wc])`

I'm confused about your train strategy

Answer 17 · 2021-11-30T02:21:40.000Z

a. The magic point trained by this repo. is different from rp's superpoint. It seems that our magic point usually generate more keypoints than sp's mp. It may be caused by homography_adaptation in homo_export_labels.py. Or you can set a larger det_thresh in *.yaml when generate coco labels.
b. According to this issue, it is better to comment the following lines

dot_product_dest = F.relu(dot_product_dest)
dot_product_desc = torch.reshape(F.normalize....)
dot_product_desc = torch.reshape(F.normalize....)

This may be unnecessary.
c. The repository have been updated.

change the photo_aug strategy in photometric_augmentation.py
Apply homo_aug first, then photo_aug in coco.py and synthetic_shapes.py

Answer 18 · 2021-11-30T02:40:52.000Z

Moreover, I can not achieve similar training performances with these improvements, keep on checking...

Answer 19 · 2021-12-01T08:04:28.000Z

Hi, i uncommented the following 3 lines and set lr=0.001, and train superpoint on the coco dataset generated by rpautrat's magicpoint model. The performance on hpatches-v is 0.698 ! much similar to rpautrat's model. I have update the repo.

    dot_product_desc = F.relu(dot_product_desc)

    ##l2_normalization
    dot_product_desc = torch.reshape(F.normalize(torch.reshape(dot_product_desc, [batch_size, Hc, Wc, Hc * Wc]),
                                                 p=2,
                                                 dim=3), [batch_size, Hc, Wc, Hc, Wc])
    dot_product_desc = torch.reshape(F.normalize(torch.reshape(dot_product_desc, [batch_size, Hc * Wc, Hc, Wc]),
                                                 p=2,
                                                 dim=1), [batch_size, Hc, Wc, Hc, Wc])

However, there may still have some problems training the magicpoint, it usually produce more key points than rpautrat's model. The may affect the final results

Answer 20 · 2021-12-01T09:49:20.000Z

if you worried about that, maybe we can use rpautrat's model to export coco dataset to train superpoint. Also, the dataset(coco2017) is different with rpautrat used(coco2014). And can i use your last version to reproduce your result?

Answer 21 · 2021-12-03T07:14:02.000Z

The performance on hpatches-v get 0.725!

Answer 22 · 2021-12-03T09:48:10.000Z

Great! Would you to share your training method ? This may help lots of people who follow this repo.

Answer 23 · 2021-12-04T02:12:41.000Z

I trained magic point with rpautrat's project to get coco gt points, then use your version to train superpoint.

Answer 24 · 2021-12-04T03:27:09.000Z

OK, thinks

Answer 25 · 2022-01-19T03:43:20.000Z

I trained magic point with rpautrat's project to get coco gt points, then use your version to train superpoint.

Hello,sorry to bother you, rpautrat's project generate the lables is .npz while this project generate lables is .npy,how do you solve this question?

Answer 26 · 2022-01-19T04:09:03.000Z

I trained magic point with rpautrat's project to get coco gt points, then use your version to train superpoint.

Hello,sorry to bother you, rpautrat's project generate the lables is .npz while this project generate lables is .npy,how do you solve this question?

Hi, it is easy to convert *.npz to .npy. And remeber to zoom the coco image to 240320 by function ratio_preserving_resize

Answer 27 · 2022-01-19T04:30:21.000Z

I trained magic point with rpautrat's project to get coco gt points, then use your version to train superpoint.

Hello,sorry to bother you, rpautrat's project generate the lables is .npz while this project generate lables is .npy,how do you solve this question?

Hi, it is easy to convert *.npz to _.npy. And remeber to zoom the coco image to 240_320 by function ratio_preserving_resize

Is that mean rpautrat's project's image size is not 240320 and this project's image size has been resized to 240320?

Answer 28 · 2022-01-19T04:32:57.000Z

I trained magic point with rpautrat's project to get coco gt points, then use your version to train superpoint.

Hello,sorry to bother you, rpautrat's project generate the lables is .npz while this project generate lables is .npy,how do you solve this question?

Hi, it is easy to convert *.npz to _.npy. And remeber to zoom the coco image to 240_320 by function ratio_preserving_resize

Though rpautrat's lables are .npz,it is one image to one npz file,so it seems easy to convert.

Answer 29 · 2022-11-29T01:20:18.000Z

I trained magic point with rpautrat's project to get coco gt points, then use your version to train superpoint.

hi, could you email me the coco gt points you got in leon_wu6@163.com?
i tried to generate them as you said, but the results are not good.

dot_product_desc = torch.reshape(F.normalize(torch.reshape(dot_product_desc, [batch_size, Hc, Wc, Hc * Wc]), p=2,dim=3), [batch_size, Hc, Wc, Hc, Wc])

dot_product_desc = torch.reshape(F.normalize(torch.reshape(dot_product_desc, [batch_size, Hc * Wc, Hc, Wc]), p=2,dim=1), [batch_size, Hc, Wc, Hc, Wc])`