vandal-vpr/vg-transformers

Doubt about the formula (1)

Closed this issue · 1 comments

Thanks for your work. I have read the paper, but I can not understand the formula (1). The M = L*N means that there are M local descriptors, so according to netvlad the sum of M residuals should be calculated for one cluster. But the formula (1) has double sum, and I don't why.
For example,L=4,N=4,that means 16 local descriptors to vetvlad,but according to formula (1),there are 40 terms to sum?

Hello, we are glad you find the work interesting. You are actually right, there is a typo in the paper. The formula is right, but the description of what M is, is not. According to the formula, M should be the number of local descriptors in a single frame ( T for transformer backbones, HW for CNNs). Therefore the total number of local descriptors in a sequence is LM, hence the double sum.
Thanks for spotting this issue, we will make sure to update the paper as well, and let us know if you still have any doubts