Failed to reach the reported scores
Fivethousand5k opened this issue · 2 comments
Hi, I trained pointscatter models with the provided configs on the DRIVE and STARE datasets, however, the test scores cannot reach the ones reported in your paper. Note that I also test the pretrained models you provided and their performance is consistent with the paper, which indicates I have done the right data preparation steps as required. I wonder if you could provide some guidance on what might be causing the discrepancy in my results and how I can train pointscatter models from scratch to reach the reported scores. Thanks!
Here are the scripts I used to train models from scratch and their corresponding results:
On DRIVE dataset
- training script
python train.py configs/segmentors/3_pointscatter.py --dataset drive --backbone unet --work-dir ./pointscatter_logs/drive/unet_3_pointscatter/train
- test script
python test.py configs/segmentors/3_pointscatter.py ./pointscatter_logs/drive/unet_3_pointscatter/train/iter_3000.pth --dataset drive --backbone unet --work-dir ./pointscatter_logs/drive/unet_3_pointscatter/eval --eval mDice
- results
"test-metric": {
"aAcc": 0.9628,
"mDice": 0.8761,
"mAcc": 0.8534999999999999,
"Dice.background": 0.979800033569336,
"Dice.vessel": 0.7725,
"Acc.background": 0.985999984741211,
"Acc.vessel": 0.7208999633789063
}
reported scores: Dice: 81.63 Acc: 95.23
On STARE dataset
- training script
python train.py configs/segmentors/3_pointscatter.py --dataset stare --backbone unet --work-dir ./pointscatter_logs/stare/unet_3_pointscatter/train
- test script
python test.py configs/segmentors/3_pointscatter.py ./pointscatter_logs/stare/unet_3_pointscatter/train/iter_3000.pth --dataset stare --backbone unet --work-dir ./pointscatter_logs/stare/unet_3_pointscatter/eval --eval mDice
- results
"test-metric": {
"aAcc": 0.9689,
"mDice": 0.8802,
"mAcc": 0.8484999999999999,
"Dice.background": 0.9833000183105469,
"Dice.vessel": 0.7770999908447266,
"Acc.background": 0.9906999969482422,
"Acc.vessel": 0.7062000274658203
}
reported scores: Dice: 82.73 Acc: 97.45
Hi, I trained pointscatter models with the provided configs on the DRIVE and STARE datasets, however, the test scores cannot reach the ones reported in your paper. Note that I also test the pretrained models you provided and their performance is consistent with the paper, which indicates I have done the right data preparation steps as required. I wonder if you could provide some guidance on what might be causing the discrepancy in my results and how I can train pointscatter models from scratch to reach the reported scores. Thanks!
Here are the scripts I used to train models from scratch and their corresponding results:
On DRIVE dataset
- training script
python train.py configs/segmentors/3_pointscatter.py --dataset drive --backbone unet --work-dir ./pointscatter_logs/drive/unet_3_pointscatter/train
- test script
python test.py configs/segmentors/3_pointscatter.py ./pointscatter_logs/drive/unet_3_pointscatter/train/iter_3000.pth --dataset drive --backbone unet --work-dir ./pointscatter_logs/drive/unet_3_pointscatter/eval --eval mDice
- results
"test-metric": {
"aAcc": 0.9628,
"mDice": 0.8761,
"mAcc": 0.8534999999999999,
"Dice.background": 0.979800033569336,
"Dice.vessel": 0.7725,
"Acc.background": 0.985999984741211,
"Acc.vessel": 0.7208999633789063
}
reported scores: Dice: 81.63 Acc: 95.23On STARE dataset
- training script
python train.py configs/segmentors/3_pointscatter.py --dataset stare --backbone unet --work-dir ./pointscatter_logs/stare/unet_3_pointscatter/train
- test script
python test.py configs/segmentors/3_pointscatter.py ./pointscatter_logs/stare/unet_3_pointscatter/train/iter_3000.pth --dataset stare --backbone unet --work-dir ./pointscatter_logs/stare/unet_3_pointscatter/eval --eval mDice
- results
"test-metric": {
"aAcc": 0.9689,
"mDice": 0.8802,
"mAcc": 0.8484999999999999,
"Dice.background": 0.9833000183105469,
"Dice.vessel": 0.7770999908447266,
"Acc.background": 0.9906999969482422,
"Acc.vessel": 0.7062000274658203
}
reported scores: Dice: 82.73 Acc: 97.45
I guess it might be because you didn't adjust the batch size when training with a single GPU. You may need to change this value to 4.
https://github.com/zhangzhao2022/pointscatter/blob/main/configs/_base_/datasets/drive.py#L34
Thanks! It seems to work! I have adjusted the batchsize to 4 and retrained pointscatter models, here are the updated results:
Drive
"metric": {
"aAcc": 0.9672,
"mDice": 0.898,
"mAcc": 0.9001,
"Dice.background": 0.9819999694824219,
"Dice.vessel": 0.8140000152587891,
"Acc.background": 0.981500015258789,
"Acc.vessel": 0.8186000061035156
}
reported scores: Dice: 81.63 Acc: 95.23
Stare
"metric": {
"aAcc": 0.9745,
"mDice": 0.9066,
"mAcc": 0.8923000000000001,
"Dice.background": 0.9862000274658204,
"Dice.vessel": 0.8269000244140625,
"Acc.background": 0.9894000244140625,
"Acc.vessel": 0.795199966430664
}
reported scores: Dice: 82.73 Acc: 97.45