rpautrat/SuperPoint

Question about descriptor loss

Closed this issue · 8 comments

Hi, Sorry for disturbing you. I have some question about the descriptor loss, Hope to get your help!

  dot_product_desc = tf.reshape(tf.nn.l2_normalize(
       tf.reshape(dot_product_desc, [batch_size, Hc, Wc, Hc * Wc]),
       3), [batch_size, Hc, Wc, Hc, Wc])
   dot_product_desc = tf.reshape(tf.nn.l2_normalize(
       tf.reshape(dot_product_desc, [batch_size, Hc * Wc, Hc, Wc]),
       1), [batch_size, Hc, Wc, Hc, Wc])

In this position, We get dot_product_desc after l2 normal. But when we calculate the loss, we hope that the positive_dist is greater than 1. As we all know after l2 normal, the feature map cannot take the value 1. So in the training phase, the loss will not converge. Is my understanding wrong?I have been troubled by this problem for a long time.

Hi,
It is correct that the dot product will be at most 1, and that a positive_dist of 1 will only be attained in the ideal case (which almost never happens). However it is not a problem for the convergence of the network. It only means that the loss on positive distance will always be a bit positive and will never reach zero, but it will still be optimized correctly.
In a sense, this margin of 1 thus encourages already good positive matches to be even closer to the perfect distance.

Get it. But when I use this loss to train my descriptor head, the loss not converge. At first, I think may be is the meshgrid fasle. But I check my warped grid, it's right.Can you give me your train log? I want to check whether my loss reproduction is correct through your train log. Thank you very much.
My warped_grid is as follows:
warp_meshgrid

My train log for SuperPoint was the following:

[11/02/2019 19:44:04 INFO] Start training
2019-11-02 19:44:12.357553: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0x2d2e9fd0
2019-11-02 19:44:14.740628: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0xf1c3020
[11/02/2019 19:45:05 INFO] Iter    0: loss 26.0874, precision 0.0037, recall 0.0077
[11/02/2019 20:09:04 INFO] Iter 5000: loss 5.6488, precision 0.0272, recall 0.0603
[11/02/2019 20:33:28 INFO] Iter 10000: loss 3.6379, precision 0.0486, recall 0.1052
[11/02/2019 20:55:14 INFO] Iter 15000: loss 2.6789, precision 0.0826, recall 0.1772
[11/02/2019 21:15:11 INFO] Iter 20000: loss 2.4092, precision 0.1275, recall 0.2574
[11/02/2019 21:35:08 INFO] Iter 25000: loss 1.6250, precision 0.1916, recall 0.3253
[11/02/2019 21:55:04 INFO] Iter 30000: loss 1.5479, precision 0.2319, recall 0.3723
[11/02/2019 22:15:00 INFO] Iter 35000: loss 1.5123, precision 0.2723, recall 0.4119
[11/02/2019 22:34:56 INFO] Iter 40000: loss 1.4282, precision 0.2930, recall 0.4453
[11/02/2019 22:54:52 INFO] Iter 45000: loss 1.3856, precision 0.2913, recall 0.4706
[11/02/2019 23:14:47 INFO] Iter 50000: loss 1.2886, precision 0.2760, recall 0.4533
[11/02/2019 23:34:43 INFO] Iter 55000: loss 1.4009, precision 0.3084, recall 0.4988
[11/02/2019 23:54:38 INFO] Iter 60000: loss 1.3107, precision 0.3355, recall 0.5177
[11/03/2019 00:14:40 INFO] Iter 65000: loss 1.7048, precision 0.3322, recall 0.5287
[11/03/2019 00:34:35 INFO] Iter 70000: loss 1.0025, precision 0.3433, recall 0.5367
[11/03/2019 00:54:31 INFO] Iter 75000: loss 1.5769, precision 0.3428, recall 0.5448
[11/03/2019 01:14:25 INFO] Iter 80000: loss 1.1283, precision 0.3512, recall 0.5497
[11/03/2019 01:34:21 INFO] Iter 85000: loss 1.3357, precision 0.3420, recall 0.5537
[11/03/2019 01:54:17 INFO] Iter 90000: loss 1.1635, precision 0.3552, recall 0.5550
[11/03/2019 02:14:14 INFO] Iter 95000: loss 0.8720, precision 0.3639, recall 0.5579
[11/03/2019 02:34:11 INFO] Iter 100000: loss 1.0154, precision 0.3488, recall 0.5606
[11/03/2019 02:54:08 INFO] Iter 105000: loss 1.0073, precision 0.3534, recall 0.5673
[11/03/2019 03:14:05 INFO] Iter 110000: loss 1.1623, precision 0.3624, recall 0.5701
[11/03/2019 03:34:08 INFO] Iter 115000: loss 0.9907, precision 0.3594, recall 0.5629
[11/03/2019 03:54:08 INFO] Iter 120000: loss 1.2128, precision 0.3784, recall 0.5711
[11/03/2019 04:14:05 INFO] Iter 125000: loss 1.2846, precision 0.3640, recall 0.5720
[11/03/2019 04:34:02 INFO] Iter 130000: loss 1.1994, precision 0.3616, recall 0.5763
[11/03/2019 04:53:59 INFO] Iter 135000: loss 1.1397, precision 0.3687, recall 0.5797
[11/03/2019 05:13:57 INFO] Iter 140000: loss 1.2604, precision 0.3605, recall 0.5754
[11/03/2019 05:33:54 INFO] Iter 145000: loss 1.4032, precision 0.3711, recall 0.5808
[11/03/2019 05:53:51 INFO] Iter 150000: loss 0.9151, precision 0.3752, recall 0.5823
[11/03/2019 06:13:48 INFO] Iter 155000: loss 1.1427, precision 0.3720, recall 0.5870
[11/03/2019 06:33:46 INFO] Iter 160000: loss 0.9974, precision 0.3686, recall 0.5850
[11/03/2019 06:53:43 INFO] Iter 165000: loss 1.4975, precision 0.3698, recall 0.5883
[11/03/2019 07:13:41 INFO] Iter 170000: loss 1.2266, precision 0.3691, recall 0.5886
[11/03/2019 07:33:39 INFO] Iter 175000: loss 1.3844, precision 0.3749, recall 0.5914
[11/03/2019 07:53:36 INFO] Iter 180000: loss 1.3627, precision 0.3694, recall 0.5789
[11/03/2019 08:13:33 INFO] Iter 185000: loss 1.0469, precision 0.3564, recall 0.5900
[11/03/2019 08:33:31 INFO] Iter 190000: loss 1.3856, precision 0.3791, recall 0.5937
[11/03/2019 08:53:29 INFO] Iter 195000: loss 1.1223, precision 0.3752, recall 0.5937
[11/03/2019 09:13:31 INFO] Iter 200000: loss 1.1352, precision 0.3765, recall 0.5978
[11/03/2019 09:33:33 INFO] Iter 205000: loss 1.0245, precision 0.3596, recall 0.5882
[11/03/2019 09:53:35 INFO] Iter 210000: loss 0.9231, precision 0.3639, recall 0.5945
[11/03/2019 10:13:36 INFO] Iter 215000: loss 0.9667, precision 0.3744, recall 0.5961
[11/03/2019 10:33:38 INFO] Iter 220000: loss 1.0049, precision 0.3757, recall 0.5928
[11/03/2019 10:53:39 INFO] Iter 225000: loss 1.0517, precision 0.3862, recall 0.5946
[11/03/2019 11:13:40 INFO] Iter 230000: loss 0.9133, precision 0.3780, recall 0.5978
[11/03/2019 11:33:41 INFO] Iter 235000: loss 1.2023, precision 0.3734, recall 0.5989
[11/03/2019 11:53:43 INFO] Iter 240000: loss 1.3015, precision 0.3775, recall 0.5998
[11/03/2019 12:13:44 INFO] Iter 245000: loss 0.9483, precision 0.3670, recall 0.5991
[11/03/2019 12:33:46 INFO] Iter 250000: loss 1.0677, precision 0.3916, recall 0.6028
[11/03/2019 12:53:49 INFO] Iter 255000: loss 1.1519, precision 0.3679, recall 0.5959
[11/03/2019 13:13:51 INFO] Iter 260000: loss 1.4319, precision 0.3837, recall 0.6004
[11/03/2019 13:33:52 INFO] Iter 265000: loss 1.1464, precision 0.3803, recall 0.6020
[11/03/2019 13:53:53 INFO] Iter 270000: loss 0.8592, precision 0.3823, recall 0.6057
[11/03/2019 14:13:55 INFO] Iter 275000: loss 1.2274, precision 0.3723, recall 0.6018
[11/03/2019 14:33:57 INFO] Iter 280000: loss 1.2828, precision 0.3765, recall 0.6097
[11/03/2019 14:53:59 INFO] Iter 285000: loss 1.0825, precision 0.3766, recall 0.6072
[11/03/2019 15:14:00 INFO] Iter 290000: loss 1.1391, precision 0.3997, recall 0.6085
[11/03/2019 15:34:02 INFO] Iter 295000: loss 1.2988, precision 0.3893, recall 0.6060
[11/03/2019 15:54:00 INFO] Training finished
[11/03/2019 15:54:03 INFO] Saving checkpoint for iteration #300000

If your model doesn't converge, you can maybe reduce the learning rate (I used 0.0001).

Thank you very much for your help, but I still have some questions about this place:

# Rescale to actual size
    shape = tf.to_float(shape[::-1])  # different convention [y, x]
    pts1 *= tf.expand_dims(shape, axis=0)
    pts2 *= tf.expand_dims(shape, axis=0)

Why should we need to convention [y, x].
In this case, do I need to convert (x, y) to (y, x) in the process of obtaining warp points?

I don't know from which file this code snippet is and what was the context, but here conversion to [y, x] is needed because we multiply the point coordinates with shape. And shape is an image shape, so it is in the [y, x] convention.

The convention needed always depends on the operations you use. For example to warp points with a homography, you need the convention [x, y] in the matrix multiplication with H if I remember correctly. But if you use the function warp_points, the expected convention is [y, x].

Thank you so much! When the model is not pre-trained, adding l2 normal directly will cause the loss to be difficult to converge.
So I first use the descriptor loss in the paper to pre-train the model. Then I added l2 normal to the descriptor loss to fine-tune the model, and finally got the desired effect!

Thanks for sharing this insight!

@rpautrat thank you so much for your repository. Currently my descriptor loss is also not converging. I decreased learning rate but still no hopes. Homographic adaptation works but descriptor training seems problem. Since you got convergence after removing l2 normal. Did you just remove l2 normal after desc_product or remove all the normalization occurring till loss.