Question of reproducing the results on camelyon16
Closed this issue · 2 comments
Thx for your great work!
I followed the instructions written in the paper and its appendix to reproduce the results on camelyon16.
However, I observed that its training was hard to converge (the loss starts to decrease quickly at about 100-th epoch). Caused by this, possibly I guess, I also found that the bag-level AUC of teacher was only about 0.6, whereas the instance-level AUC of student was extremely high (0.94).
I am not sure whether I used the same hyper-parameters as the paper. So, could you provide the full hyper-parameters setting of training camelyon16?
Through experiments, I found that the issues above were largely due to
- Batch size = 1
- a simple SGD optimizer.
(as provided in the source code of this repo)
I changed them as follows.
- Batch size = 4, realized by gradient accumulation.
- Adam optimizer.
All things became more reasonable.
Thanks for your attention and your contribution!