zhangzhao2022/pointscatter

Failed to reach the reported scores

Fivethousand5k opened this issue · 2 comments

Hi, I trained pointscatter models with the provided configs on the DRIVE and STARE datasets, however, the test scores cannot reach the ones reported in your paper. Note that I also test the pretrained models you provided and their performance is consistent with the paper, which indicates I have done the right data preparation steps as required. I wonder if you could provide some guidance on what might be causing the discrepancy in my results and how I can train pointscatter models from scratch to reach the reported scores. Thanks!

Here are the scripts I used to train models from scratch and their corresponding results:

On DRIVE dataset

  • training script
python train.py configs/segmentors/3_pointscatter.py --dataset drive --backbone unet --work-dir ./pointscatter_logs/drive/unet_3_pointscatter/train
  • test script
python test.py configs/segmentors/3_pointscatter.py ./pointscatter_logs/drive/unet_3_pointscatter/train/iter_3000.pth --dataset drive --backbone unet --work-dir ./pointscatter_logs/drive/unet_3_pointscatter/eval --eval mDice
  • results
    "test-metric": {
    "aAcc": 0.9628,
    "mDice": 0.8761,
    "mAcc": 0.8534999999999999,
    "Dice.background": 0.979800033569336,
    "Dice.vessel": 0.7725,
    "Acc.background": 0.985999984741211,
    "Acc.vessel": 0.7208999633789063
    }
    reported scores: Dice: 81.63 Acc: 95.23

On STARE dataset

  • training script
python train.py configs/segmentors/3_pointscatter.py --dataset stare --backbone unet --work-dir ./pointscatter_logs/stare/unet_3_pointscatter/train
  • test script
python test.py configs/segmentors/3_pointscatter.py  ./pointscatter_logs/stare/unet_3_pointscatter/train/iter_3000.pth --dataset stare --backbone unet --work-dir ./pointscatter_logs/stare/unet_3_pointscatter/eval --eval mDice
  • results
    "test-metric": {
    "aAcc": 0.9689,
    "mDice": 0.8802,
    "mAcc": 0.8484999999999999,
    "Dice.background": 0.9833000183105469,
    "Dice.vessel": 0.7770999908447266,
    "Acc.background": 0.9906999969482422,
    "Acc.vessel": 0.7062000274658203
    }
    reported scores: Dice: 82.73 Acc: 97.45

Hi, I trained pointscatter models with the provided configs on the DRIVE and STARE datasets, however, the test scores cannot reach the ones reported in your paper. Note that I also test the pretrained models you provided and their performance is consistent with the paper, which indicates I have done the right data preparation steps as required. I wonder if you could provide some guidance on what might be causing the discrepancy in my results and how I can train pointscatter models from scratch to reach the reported scores. Thanks!

Here are the scripts I used to train models from scratch and their corresponding results:

On DRIVE dataset

  • training script
python train.py configs/segmentors/3_pointscatter.py --dataset drive --backbone unet --work-dir ./pointscatter_logs/drive/unet_3_pointscatter/train
  • test script
python test.py configs/segmentors/3_pointscatter.py ./pointscatter_logs/drive/unet_3_pointscatter/train/iter_3000.pth --dataset drive --backbone unet --work-dir ./pointscatter_logs/drive/unet_3_pointscatter/eval --eval mDice
  • results
    "test-metric": {
    "aAcc": 0.9628,
    "mDice": 0.8761,
    "mAcc": 0.8534999999999999,
    "Dice.background": 0.979800033569336,
    "Dice.vessel": 0.7725,
    "Acc.background": 0.985999984741211,
    "Acc.vessel": 0.7208999633789063
    }
    reported scores: Dice: 81.63 Acc: 95.23

On STARE dataset

  • training script
python train.py configs/segmentors/3_pointscatter.py --dataset stare --backbone unet --work-dir ./pointscatter_logs/stare/unet_3_pointscatter/train
  • test script
python test.py configs/segmentors/3_pointscatter.py  ./pointscatter_logs/stare/unet_3_pointscatter/train/iter_3000.pth --dataset stare --backbone unet --work-dir ./pointscatter_logs/stare/unet_3_pointscatter/eval --eval mDice
  • results
    "test-metric": {
    "aAcc": 0.9689,
    "mDice": 0.8802,
    "mAcc": 0.8484999999999999,
    "Dice.background": 0.9833000183105469,
    "Dice.vessel": 0.7770999908447266,
    "Acc.background": 0.9906999969482422,
    "Acc.vessel": 0.7062000274658203
    }
    reported scores: Dice: 82.73 Acc: 97.45

I guess it might be because you didn't adjust the batch size when training with a single GPU. You may need to change this value to 4.
https://github.com/zhangzhao2022/pointscatter/blob/main/configs/_base_/datasets/drive.py#L34

Thanks! It seems to work! I have adjusted the batchsize to 4 and retrained pointscatter models, here are the updated results:

Drive

"metric": {
    "aAcc": 0.9672,
    "mDice": 0.898,
    "mAcc": 0.9001,
    "Dice.background": 0.9819999694824219,
    "Dice.vessel": 0.8140000152587891,
    "Acc.background": 0.981500015258789,
    "Acc.vessel": 0.8186000061035156
}

reported scores: Dice: 81.63 Acc: 95.23

Stare

"metric": {
    "aAcc": 0.9745,
    "mDice": 0.9066,
    "mAcc": 0.8923000000000001,
    "Dice.background": 0.9862000274658204,
    "Dice.vessel": 0.8269000244140625,
    "Acc.background": 0.9894000244140625,
    "Acc.vessel": 0.795199966430664
}

reported scores: Dice: 82.73 Acc: 97.45