Could you release small/tiny/nano version of detector and descriptor?

Question

Could you release small/tiny/nano version of detector and descriptor?

zhongqiu1245 opened this issue 9 months ago · 12 comments

Hello, thank you for your amazing job!
I'm really interesting of your job and want to deploy DeDoDe on mobile devices(laptop, even CPU) for some self-driving works.
But I find it is too heavy for mobile device to run DeDoDeDescriptorB, DeDoDeDetectorL.
In my computer(RTX4060 mobile 8G), only 5.4 fps when inputs with 640*480 (tensorrt_fp16)
Could you release small/tiny/nano version of detector and descriptor?
Thank you in advance!

Answer 1 · 2024-04-22T04:58:31.000Z

Sure, the easiest I guess would be using vgg11 and reducing layers further. Should be doable. Not sure how much performance will degrade.

Answer 2 · 2024-04-22T05:16:18.000Z

about 30fps in RTX4060 mobile 8G.

Answer 3 · 2024-04-22T13:42:17.000Z

@zhongqiu1245 could you try out the small detector in the branch that references this issue?

Weights can be found here: https://github.com/Parskatt/DeDoDe/releases/tag/v2

Answer 4 · 2024-04-22T13:43:28.000Z

It uses a VGG11 backbone and I reduced the number of layers at each scale from 8 -> 4 and cut the dimensionality in half. I think it should be about 3-4X faster than the _L detector. Could you verify?

Answer 5 · 2024-04-22T13:44:09.000Z

Depending on your application it might also be possible to increase the framerate by batching, is this an option for you?

Answer 6 · 2024-04-27T01:32:39.000Z

@Parskatt
Sorry for reply so late.
I will verify this.
Thank you!

Answer 7 · 2024-04-27T13:38:55.000Z

@Parskatt
Thank you for your DetectorS!
The fps increases rapidly, but still lower than 30fps (15.9fps, DetectorS + DescriptorB, 640*480, tensorrt fp16).

So I reduce the shape of img to 320 * 240, then fps=25, almost there.
Could you release a small version of Descriptor? Like DescriptorS?
Maybe this can help DoDeDo breaks up the limitation of 30fps.
Thank you!

Answer 8 · 2024-04-28T14:24:35.000Z

Sure, then I think we can also reduce descriptor size. Does 128 sound better? Is descriptor dimensinality a concern?

Answer 9 · 2024-04-28T15:32:04.000Z

Thank you for your reply !
128 sounds better.
Yes, dim is an important factor which can speed up/slow down the inference time of net.The dim is smaller, the speed is faster. However, if dim is too small, it will cause bad performance. I thought dim=64 before but I thought it maybe too small. 128 maybe better :)
Thank you for your generous!

Answer 10 · 2024-04-28T15:49:42.000Z

some details:
resolution: (480, 640)
preprocess: 19.606828689575195ms
detectorS: 16.09945297241211ms
descriptorB: 29.36267852783203ms
dualsoftmaxmatcher: 0.6873607635498047ms
postprocess: 0.14138221740722656ms
total: 65.89770317077637ms fps: 15.207663468720314
detectorS & descriptorB are trt_fp16

Answer 11 · 2024-04-28T16:16:28.000Z

Okay, so seems like around 20fps is at least possible with current sizes.

Are you able to extract the times for the encoder/decoder parts of the network? Depending on what is taking most time might need to change enc architecture.

The final thing I guess would be to distill both networks into a single network.

Answer 12 · 2024-04-29T07:46:32.000Z

ok, I will try later.