The official implementation of "5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks".
Pre-training & fine-tuning can enhance the transferring efficiency and performance in visual tasks. Recent deltatuning methods provide more options for visual classification tasks. Despite their success, existing visual delta-tuning art fails to exceed the upper limit of full fine-tuning on challenging tasks like instance segmentation and semantic segmentation. To find a competitive alternative to full fine-tuning, we propose the Multi-cognitive Visual Adapter (Mona) tuning, a novel adapter-based tuning method.
Mona achieves the strong performance on COCO object detection (53.4 box AP
and 46.0 mask AP
on test-dev
with
Swin-Base), and ADE20K semantic segmentation (51.36 mIoU
on val
with Swin-Large).
The proposed Mona outperforms full fine-tuning on representative visual tasks, which promotes the upper limit of previous delta-tuning art. The results demonstrate that the adapter-tuning paradigm can replace full fine-tuning and achieve better performance in most visual tasks. Full fine-tuning may no longer be the only preferred solution for transfer learning in the future.
Note:
- We report the results with
Cascade Mask R-CNN
(Swin-Base) andUperNet
(Swin-Large) framework for COCO and ADE20K respectively. - The pre-trained weights are IM22K-Supervied pre-trained Swin-Base and Swin-Large.
Moreover, Mona converges faster than other tested delta-tuning arts.
Note:
- We obtain the loss on
VOC
dataset withRetinaNet
equipped with Swin-Large.
Please refer to Swin-Transformer-Object-Detection for the environments and dataset preparation.
After organizing the dataset, you have to modify the config file according to your environments.
data_root
, have to be set as the actual dataset path.load_from
, should be set to your pre-trained weight path.norm_cfg
, have to be set toSyncBN
if you train the model with multi-gpus.
Please execute the following command in the project path.
bash Swin-Transformer-Object-Detection/tools/dist_train.sh Swin-Transformer-Object-Detection/mona_configs/swin-b_coco/cascade_mask_swin_base_3x_coco_sample_1_bs_16_mona.py `Your GPUs`
bash Swin-Transformer-Object-Detection/tools/dist_train.sh Swin-Transformer-Object-Detection/mona_configs/swin-l_voc/voc_retinanet_swin_large_1x_mona.py `Your GPUs`
Please refer to Swin-Transformer-Semantic-Segmentation for the environments and dataset preparation.
Follow the guidance in Object Detection & Instance Segmentation to check your config file.
Please execute the following command in the project path.
bash Swin-Transformer-Semantic-Segmentation/tools/dist_train.sh Swin-Transformer-Semantic-Segmentation/mona_configs/swin-l_ade20k/ade20k_upernet_swin_large_160k_mona.py `Your GPUs`
Please refer to Swin-Transformer-Classification for environments. the environments.
Note:
- We reorganize the dataset format to match the requirements of mmclassification.
- You can follow the following format:
mmclassification
└── data
└── my_dataset
├── meta
│ ├── train.txt
│ ├── val.txt
│ └── test.txt
├── train
├── val
└── test
Follow the guidance in Object Detection & Instance Segmentation to check your config file.
Please execute the following command in the project path.
bash Swin-Transformer-Classification/tools/dist_train.sh Swin-Transformer-Classification/mona_configs/swin-l_oxford-flower/swin-large_4xb8_oxford_flower_mona.py `Your GPUs`
bash Swin-Transformer-Classification/tools/dist_train.sh Swin-Transformer-Classification/mona_configs/swin-l_oxford-flower/swin-large_4xb8_oxford_pet_mona.py `Your GPUs`
bash Swin-Transformer-Classification/tools/dist_train.sh Swin-Transformer-Classification/mona_configs/swin-l_oxford-flower/swin-large_4xb8_voc_mona.py `Your GPUs`
If our work is helpful for your research, please cite:
@misc{yin20245100breakingperformanceshackles,
title={5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks},
author={Dongshuo Yin and Leiyi Hu and Bin Li and Youqun Zhang and Xue Yang},
year={2024},
eprint={2408.08345},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.08345},
}
We are grateful for the following, but not limited to these, wonderful open-source repositories.