ZhengPeng7/BiRefNet

Any information on compared with MVANet?

Closed this issue · 14 comments

What's your superiority compared with the work [Multi-view Aggregation Network for Dichotomous Image Segmentation]?

MVANet is an interesting and good work, of which the results in their paper are even greater than ours on DIS5K.
Compared with it, BiRefNet:

  1. has a simpler architecture: MVANet needs to crop a whole image into patches for parallel feature forwarding, which may not be easy to adapt when bs > 1.
  2. does more comprehensive experiments on many HR tasks: DIS, HRSOD, and COD, same architecture to achieve SOTA on these different tasks.
  3. has better community maintenance by enthusiastic contributors from the community and myself to publish more applications (human portrait segmentation, massive training for general object extraction, ...) and many 3rd-party applications, some of which are listed in README.
  4. has a better code framework (in my personal view), containing various plug-and-play modules, training acceleration, a better evaluation process, and backbone options...

BTW, have you run the code of MVANet?

Many thanks for your detailed reply.
I am currently working on reimplementing the training process for MVANet, and I expect to finish the training process the day after tomorrow. If you're interested, I would be more than happy to discuss it with you and share my progress once it's completed.
I will also delve deeper into and try out your work. Thanks again.

You are welcome:) Looking forward to your results of MVANet. I also did the re-training of if but want to know the results reproduced by you.

Hi @wang21jun , got any results?

Trained by their given setting, that is :
epoch: 80;
lr_gen: 1e-5
batchsize: 1
trainsize: 1024
backbone(Swin-B) and pretrain model: swin_base_patch4_window12_384_22kto1k.pth
training set: DIS5K-TR, evaluation set: DIS5K-VD
I got the following results:
Smeasure: 0.877
meanEm: 0.888
wFmeasure: 0.803
maximal Fmeasure: 0.872
MAE: 0.046

The obtained results demonstrate a certain level of dissatisfaction, requiring further examination.
I am also attempting to retrain your BiRefNet model with the Swin-L backbone, which is anticipated to be completed by tomorrow.

Thx, that's a long process. Looking forward to hearing the results of retrained BiRefNet from you, too!

Trained by 2 A100-80G with this script: './train_test.sh DIS 0,1 0'.(keep self.batch_size=4, Swin-L as backbone):
Smeasure: 0.885
meanEm: 0.92
wFmeasure: 0.838
maximal Fmeasure: 0.877
MAE: 0.041
Although I was unable to reproduce the exact results of the paper, this is the best outcome I was able to achieve after trying these works: IS-Net, SegRefiner, MVANet, BiRefNet and so on.
Taking into account the costs related to both training and inference, I will diligently explore several effective strategies to optimize the process. For instance, I will examine whether training the model for 600 epochs is truly necessary or if a reduced number of epochs could achieve satisfactory results. Additionally, I will consider the feasibility of switching the backbone to Swin-B, among other potential modifications, to further enhance the efficiency and performance of the process.

Glad to see your results!
Did you run python gen_best_ep.py to select the best ckpt? The default setting is for training on 8*A100-80G, especially the learning rate (half it for 2*A100-80G). In the following days, I'll also try some tricks, like half-precision training.

BTW, if you want to save some time in evaluation, you can turn off the calculation of some metrics as given in this line. Looking forward to hearing good results from you on the experiments you want to do.

Sure.

trained by 2*A100-80G,with lr=3e-5,valid on DIS-VD, new results:
maxFm=0.902;wFmeasure=0.861;MAE=.035;Smeasure=0.906;meanEm=0.935;HCE=1057

Wow, that's great! Even kind of better than my training on A100-80G x8. There seems to be still some space for improvement by adapting the hyper-parameters. Thanks!

Also provide a result trained by swin-b,with lr=3e-5,bs=6:
maxFm=0.897;wFmeasure=0.857;MAE=.037;Smeasure=0.903;meanEm=0.944;HCE=1060

Thanks for your updates! I've also spared time and GPUs to train BiRefNet with almost all quantity levels of backbones. The results and weights have been uploaded to the google drive folder. Your results are similar to mine.