4 Aalto University 5 Shanghai AI Laboratory 6 University of Trento
DIS-Sample_1 | DIS-Sample_2 |
---|---|
This repo is the official implementation of "Bilateral Reference for High-Resolution Dichotomous Image Segmentation" (CAAI AIR 2024).
Aug 19, 2024
: We uploaded the ONNX model files of all weights in the GitHub release and GDrive folder.Jul 30, 2024
: Thanks to @not-lain for his kind efforts in adding BiRefNet to the official huggingface.js repo.Jul 28, 2024
: We released the Colab demo for box-guided segmentation.Jul 15, 2024
: We deployed our BiRefNet on Hugging Face Models for users to easily load it in one line code.Jun 21, 2024
: We released and uploaded the Chinese version of our original paper to my GDrive.May 28, 2024
: We hold a model zoo with well-trained weights of our BiRefNet in different sizes and for different tasks, including general use, portrait segmentation, DIS, HRSOD, COD, etc.May 7, 2024
: We also released the Colab demo for single image inference. Many thanks to @rishabh063 for his support on it.Apr 9, 2024
: Thanks to Features and Labels Inc. for deploying a cool online BiRefNet inference API and providing me with strong GPU resources for further experiments!Mar 7, 2024
: We released BiRefNet codes, the well-trained weights for all tasks in the original papers, and all related stuff in my GDrive folder. Meanwhile, we also deployed our BiRefNet on Hugging Face Spaces for easier online use and released the Colab demo for inference and evaluation.Jan 7, 2024
: We released our paper on arXiv.
from transformers import AutoModelForImageSegmentation
birefnet = AutoModelForImageSegmentation.from_pretrained('zhengpeng7/BiRefNet', trust_remote_code=True)
We are really happy to collaborate with FAL to deploy the inference API of BiRefNet. You can access this service via the link below:
Our BiRefNet has achieved SOTA on many similar HR tasks:
- Inference and evaluation of your given weights:
- Online Inference with GUI with adjustable resolutions:
- Online Single Image Inference on Colab:
For more general use of our BiRefNet, I managed to extend the original adademic one to more general ones for better application in real life.
Datasets and datasets are suggested to download from official pages. But you can also download the packaged ones: DIS, HRSOD, COD, Backbones.
Find performances (almost all metrics) of all models in the
exp-TASK_SETTINGS
folders in [stuff].
Models in the original paper, for comparison on benchmarks:
Task | Training Sets | Backbone | Download |
---|---|---|---|
DIS | DIS5K-TR | swin_v1_large | google-drive |
COD | COD10K-TR, CAMO-TR | swin_v1_large | google-drive |
HRSOD | DUTS-TR | swin_v1_large | google-drive |
HRSOD | HRSOD-TR | swin_v1_large | google-drive |
HRSOD | UHRSD-TR | swin_v1_large | google-drive |
HRSOD | DUTS-TR, HRSOD-TR | swin_v1_large | google-drive |
HRSOD | DUTS-TR, UHRSD-TR | swin_v1_large | google-drive |
HRSOD | HRSOD-TR, UHRSD-TR | swin_v1_large | google-drive |
HRSOD | DUTS-TR, HRSOD-TR, UHRSD-TR | swin_v1_large | google-drive |
Models trained with customed data (general, portrait), for general use in practical application:
Task | Training Sets | Backbone | Test Set | Metric (S, wF[, HCE]) | Download |
---|---|---|---|---|---|
general use | DIS5K-TR,DIS-TEs, DUTS-TR_TE,HRSOD-TR_TE,UHRSD-TR_TE, HRS10K-TR_TE, TR-P3M-10k, TE-P3M-500-NP, TE-P3M-500-P, TR-humans | swin_v1_large | DIS-VD | 0.911, 0.875, 1069 | google-drive |
general use | DIS5K-TR,DIS-TEs, DUTS-TR_TE,HRSOD-TR_TE,UHRSD-TR_TE, HRS10K-TR_TE, TR-P3M-10k, TE-P3M-500-NP, TE-P3M-500-P, TR-humans | swin_v1_tiny | DIS-VD | 0.882, 0.830, 1175 | google-drive |
general use | DIS5K-TR, DIS-TEs | swin_v1_large | DIS-VD | 0.907, 0.865, 1059 | google-drive |
portrait segmentation | P3M-10k, humans | swin_v1_large | P3M-500-P | 0.983, 0.989 | google-drive |
ONNX conversion:
We converted from
.pth
weights files to.onnx
files.
We referred a lot to the Kazuhito00/BiRefNet-ONNX-Sample, many thanks to @Kazuhito00.
- Check our Colab demo for ONNX conversion or the notebook file for local running, where you can do the conversion/inference by yourself and find all relevant info.
- As tested, BiRefNets with SwinL (default backbone) cost
~90%
more time (the inference costs~165ms
on an A100 GPU) using ONNX files. Meanwhile, BiRefNets with SwinT (lightweight) cost~75%
more time (the inference costs~93.8ms
on an A100 GPU) using ONNX files. Input resolution is1024x1024
as default. - The results of the original pth files and the converted onnx files are slightly different, which is acceptable.
- Pay attention to the compatibility among
onnxruntime-gpu, CUDA, and CUDNN
(we usetorch==2.0.1, cuda=11.8
here).
Concerning edge devices with less computing power, we provide a lightweight version with
swin_v1_tiny
as the backbone, which is x4+ faster and x5+ smaller. The details can be found in this issue and links there.
We found there've been some 3rd party applications based on our BiRefNet. Many thanks for their contribution to the community!
Choose the one you like to try with clicks instead of codes:
-
Applications:
-
Thanks fal.ai/birefnet: this project on
fal.ai
encapsulates BiRefNet online with more useful options in UI and API to call the model. -
Thanks ZHO-ZHO-ZHO/ComfyUI-BiRefNet-ZHO: this project further improves the UI for BiRefNet in ComfyUI, especially for video data.
app-comfyUI_ZHO.mp4
-
Thanks viperyl/ComfyUI-BiRefNet: this project packs BiRefNet as ComfyUI nodes, and makes this SOTA model easier use for everyone.
-
Thanks Rishabh for offerring a demo for the easier single image inference on colab.
-
-
More Visual Comparisons
-
Thanks twitter.com/ZHOZHO672070 for the comparison with more background-removal methods in images:
-
Thanks twitter.com/toyxyz3 for the comparison with more background-removal methods in videos:
video-from_twitter_toyxyz3_2.mp4
video-from_twitter_toyxyz3_1.mp4
-
# PyTorch==2.0.1 is used for faster training with compilation.
conda create -n birefnet python=3.9 -y && conda activate birefnet
pip install -r requirements.txt
Download combined training / test sets I have organized well from: DIS--COD--HRSOD or the single official ones in the single_ones
folder, or their official pages. You can also find the same ones on my BaiduDisk: DIS--COD--HRSOD.
Download backbone weights from my google-drive folder or their official pages.
# Train & Test & Evaluation
./train_test.sh RUN_NAME GPU_NUMBERS_FOR_TRAINING GPU_NUMBERS_FOR_TEST
# Example: ./train_test.sh tmp-proj 0,1,2,3,4,5,6,7 0
# See train.sh / test.sh for only training / test-evaluation.
# After the evaluation, run `gen_best_ep.py` to select the best ckpt from a specific metric (you choose it from Sm, wFm, HCE (DIS only)).
Download the BiRefNet-{TASK}-{EPOCH}.pth
from [stuff]. Info of the corresponding (predicted_maps/performance/training_log) weights can be also found in folders like exp-BiRefNet-{TASK_SETTINGS}
in the same directory.
You can also download the weights from the release of this repo.
The results might be a bit different from those in the original paper, you can see them in the eval_results-BiRefNet-{TASK_SETTINGS}
folder in each exp-xx
, we will update them in the following days. Due to the very high cost I used (A100-80G x 8) which many people cannot afford to (including myself....), I re-trained BiRefNet on a single A100-40G only and achieve the performance on the same level (even better). It means you can directly train the model on a single GPU with 36.5G+ memory. BTW, 5.5G GPU memory is needed for inference in 1024x1024. (I personally paid a lot for renting an A100-40G to re-train BiRefNet on the three tasks... T_T. Hope it can help you.)
But if you have more and more powerful GPUs, you can set GPU IDs and increase the batch size in config.py
to accelerate the training. We have made all this kind of things adaptive in scripts to seamlessly switch between single-card training and multi-card training. Enjoy it :)
This project was originally built for DIS only. But after the updates one by one, I made it larger and larger with many functions embedded together. Finally, you can use it for any binary image segmentation tasks, such as DIS/COD/SOD, medical image segmentation, anomaly segmentation, etc. You can eaily open/close below things (usually in config.py
):
- Multi-GPU training: open/close with one variable.
- Backbone choices: Swin_v1, PVT_v2, ConvNets, ...
- Weighted losses: BCE, IoU, SSIM, MAE, Reg, ...
- Adversarial loss for binary segmentation (proposed in my previous work MCCL).
- Training tricks: multi-scale supervision, freezing backbone, multi-scale input...
- Data collator: loading all in memory, smooth combination of different datasets for combined training and test.
- ... I really hope you enjoy this project and use it in more works to achieve new SOTAs.
@article{zheng2024birefnet,
title={Bilateral Reference for High-Resolution Dichotomous Image Segmentation},
author={Zheng, Peng and Gao, Dehong and Fan, Deng-Ping and Liu, Li and Laaksonen, Jorma and Ouyang, Wanli and Sebe, Nicu},
journal={CAAI Artificial Intelligence Research},
year={2024}
}
Any questions, discussions, or even complaints, feel free to leave issues here or send me e-mails (zhengpeng0108@gmail.com). You can also join the Discord Group (https://discord.gg/d9NN5sgFrq) or QQ Group (https://qm.qq.com/q/y6WPy7WOIK) if you want to talk a lot publicly.