/M3I-Pretraining

[CVPR 2023] implementation of Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information.

M3I Pre-training

PWC

PWC

PWC

PWC PWC

This repository is an official implementation of CVPR 2023 paper Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information.

By Weijie Su, Xizhou Zhu, Chenxin Tao, Lewei Lu, Bin Li, Gao Huang, Yu Qiao, Xiaogang Wang, Jie Zhou, Jifeng Dai.

Code will be available.

Introduction

Maximizing Multi-modal Mutual Information Pre-training (M3I Pre-training), initially described in arxiv, is a simple yet effective one-stage pre-training paradigm. It can integrate existing pre-training methods (supervised pre-training, weakly-supervised pre-training and self-supervised pre-training) under an unified mutual information perspective and maintain all desired properties through a single-stage pre-training. Notably, we successfully pre-train a 1B model (InternImage-H) with M3I Pre-training and achieve new record 65.4 mAP on COCO detection test-dev, 62.5 mAP on LVIS detection minival, and 62.9 mIoU on ADE20k.

m3i pre-training

Citation

If this work is helpful for your research, please consider citing the following BibTeX entry.

@InProceedings{Su_2023_CVPR,
    author    = {Su, Weijie and Zhu, Xizhou and Tao, Chenxin and Lu, Lewei and Li, Bin and Huang, Gao and Qiao, Yu and Wang, Xiaogang and Zhou, Jie and Dai, Jifeng},
    title     = {Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {15888-15899}
}