LeapLabTHU/DAT

some questions about the reference points and offset network

LixDemon opened this issue · 1 comments

Really nice work! I have some questions about the code. I see your implementation about the conv_offset and I find you use stride of 1 so the reference points is actually the whole map. But the paper says there is a stride of r. If there is no stride larger than 1, the complexity is the same as standard MHSA even larger! I think there maybe something wrong here.

Thanks for your question. The r controls the number of sampling keys, we set it to 1 for a better accuracy on ImageNet with a little higher FLOPs, but 2 also provides enough gains with some larger range factor. These two parameters still have a lot of space to optimize, which will be included in our future work.