microsoft/X-Decoder

Question about DaViT-L pretrained checkpoint.

praeclarumjj3 opened this issue · 2 comments

Hi, thanks for your excellent work!

I noticed you used DaViT-L as a backbone in your experiments. However, the original repo does not contain a pretrained checkpoint for DaViT-L. Do you plan on releasing that anytime soon in the future?

Additonally, do you have any numbers for DaViT-L Mask2Former or Swin-L X-Decoder when comparing to other methods on the ADE20K dataset as that would be a more fair comparison for the decoder architectures (with the same backbone). Particularly, I am interested in your experimental setting for the 52.4 PQ (SOTA) result on the ADE20K dataset.

Thanks!

Thanks for your interest! We are not able to release any davit checkpoints due to company policy. But we will try to train other L models in compensation for lacking of Davit-B/L checkpoints.

Thanks so much for your suggestion, I think DaViT-L Mask2Former is a good setting, will try it out for a fair comparison.

Hi,

Although I could not get a davit-L pretrained with in21k, but we use Focal-L to do a fair comparison:

Init with Focal-L pretrained in21k ckpt: PQ (47.4), mAP (33.2), mIoU (55.3)
Init with X-Decoder pretrained with Focal-L (pretrained on in21k): PQ (50.1), mAP (36.3), mIoU (56.5)

Two models are trained with the same iteration and batch size.