🚀Fast ConvMAE🚀

Updates

17/June/2022

Released the pre-training codes for ImageNet-1K.

Fast ConvMAE framework is a superiorly fast masked modeling scheme via complementary masking and mixture of reconstrunctors based on the ConvMAE.

The following table provides pretrained checkpoints and logs used in the paper.

	Fast ConvMAE-Base
50epoch pretrained checkpoints	N/A
logs	N/A

Models	Masking	Tokenizer	Backbone	PT Epochs	PT Hours	COCO FT Epochs	$AP^{Box}$	$AP^{Mask}$	ImageNet Finetune Epochs	Finetune acc@1(%)	ADE 20K mIoU
ConvMAE	25 %	RGB	ConvViT-B	200	512	25	50.8	45.4	100	84.4	48.5
ConvMAE	25 %	RGB	ConvViT-B	1600	4000	25	53.2	47.1	100	85.0	51.7
MAE	25 %	RGB	ViT-B	1600	2069	100	50.3	44.9	100	83.6	48.1
SimMIM	100 %	RGB	Swin-B	800	1609	36	50.4	44.4	100	84.0	-
GreenMIM	25 %	RGB	Swin-B	800	887	36	50.0	44.1	100	85.1	-
ConvMAE	100 %	RGB	ConvViT-B	50	266	25	51.0	45.4	100	84.4	48.3
ConvMAE	100 %	C+T	ConvViT-B	50	333	25	52.8	46.9	100	85.0	52.7
ConvMAE	100 %	C+T	ConvViT-B	100	666	25	53.3	47.3	100	85.2	52.8
ConvMAE	100 %	C+T	ConvViT-L	200	N/A	25	N/A	N/A	50	86.7	54.5

NOTE: Grey patches are masked and colored ones are kept.

The pretraining and finetuning of our project are based on DeiT, MAE, and ConvMAE. Thanks for their wonderful work.

FastConvMAE is released under the MIT License.