impiga/Plain-DETR

Is this Idea also effective for Resnet backbone?

Closed this issue · 5 comments

Is this Idea also effective for Resnet backbone? If so,it will extraordinary improve the latency of Real-Time Detection

@sdreamforchen I enjoy this impressive work, but I think its performance will sharply decline, because the ResNet series does not support Masked Image Modeling, which is one of the key factors for the significant performance improvement of PlainDETR. We can infer from the data in the paper that the performance of PlainDETR-R50 will be significantly lower than that of DINO-R50 under 1x training configuration. Perhaps, we should try to solve how to implement MIM based on CNN architecture to revitalize ResNet. At that time, your concerns may be well addressed.

impiga commented

HI, @sdreamforchen

I agree with @yjh0410 's opinion. The key here would be a good MIM pretraining for CNNs.

HI, @sdreamforchen

I agree with @yjh0410 's opinion. The key here would be a good MIM pretraining for CNNs.

How about Spark ,a unique MIM for cnn-style

This is Spark's paper url :
https://arxiv.org/abs/2301.03580