Meituan-AutoML/CPVT

Question about experiments in sec5.4

Closed this issue · 1 comments

Hi, thanks for the great work! It is pretty interesting to see that convolution can be a good choice to provide position information for transformers.

I have some questions about the experiments in sec 5.4 (Importance of Zero Paddings) in your paper. It is mentioned that

"Specifically, we use CPVT-S and simply remove the zero paddings from CPVT while keeping all other parts unchanged. "

However, based on my understanding, removing zero paddings directly while keeping all other things the same will lead to a change in the output size, and thus the output may not be able to be added to the original feature maps? Did you use any method to handle the differences in the feature shape?

Thanks a lot for the work and your time.

Sorry for the late reply.
Removing zero paddings directly while keeping all other things the same indeed leads to a change in the output size.
In this case, we just remove the skip connection of PEG.
Note that, the following transformer block can suit this changed sequence and adapt.