Official PyTorch implementation of BiFormer, from the following paper:
BiFormer: Vision Transformer with Bi-Level Routing Attention. CVPR 2023.
Lei Zhu, Xinjiang Wang, Zhanghan Ke, Wayne Zhang, and Rynson Lau
name | resolution | acc@1 | #params | FLOPs | model | log | tensorboard log* |
---|---|---|---|---|---|---|---|
BiFormer-T | 224x224 | 81.4 | 13.1 M | 2.2 G | model | log | - |
BiFormer-S | 224x224 | 83.8 | 25.5 M | 4.5 G | model | log | tensorboard.dev |
BiFormer-B | 224x224 | 84.3 | 56.8 M | 9.8 G | model | log | - |
* : reproduced after the acceptance of our paper.
All files can be accessed from onedrive.
Please check INSTALL.md for installation instructions.
We did evaluation on a slurm cluster environment, using the command below:
python hydra_main.py \
data_path=./data/in1k input_size=224 batch_size=128 dist_eval=true \
+slurm=${CLUSTER_ID} slurm.nodes=1 slurm.ngpus=8 \
eval=true load_release=true model='biformer_small'
To test on a local machine, you may try
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--data_path ./data/in1k --input_size 224 --batch_size 128 --dist_eval \
--eval --load_release --model biformer_small
This should give
* Acc@1 83.754 Acc@5 96.638 loss 0.869
Accuracy of the network on the 50000 test images: 83.8%
Note: By setting load_release=true
, the released checkpoints will be automatically downloaded, so you do not need to download manually in advance.
To launch training on a slurm cluster, use the command below:
python hydra_main.py \
data_path=./data/in1k input_size=224 batch_size=128 dist_eval=true \
+slurm=${CLUSTER_ID} slurm.nodes=1 slurm.ngpus=8 \
model='biformer_small' drop_path=0.15 lr=5e-4
Note: Our codebase automatically generates output directory for experiment logs and checkpoints, according to the passed arguments. For example, the command above will produce an output directory like
$ tree -L 3 outputs/
outputs/
└── cls
└── batch_size.128-drop_path.0.15-input_size.224-lr.5e-4-model.biformer_small-slurm.ngpus.8-slurm.nodes.2
└── 20230307-21:33:26
This repository is built using the timm library, and ConvNext, UniFormer repositories.
This project is released under the MIT license. Please see the LICENSE file for more information.
If you find this repository helpful, please consider citing:
@Article{zhu2022biformer,
author = {Lei Zhu and Xinjiang Wang and Zhanghan Ke and Wayne Zhang and Rynson Lau},
title = {BiFormer: Vision Transformer with Bi-Level Routing Attention},
journal = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2023},
}
- Add camera-ready paper link
- IN1k standard training code, log, and pretrained checkpoints
- IN1k token-labeling code
- Semantic segmentation code
- Object detection code
- Swin-Tiny-Layout (STL) models
- Refactor BRA and BiFormer code
- Visualization demo
-
More efficient implementation with triton. See triton issue #1279 - More efficient implementation (fusing gather and attention) with CUDA