hellochick/PSPNet-tensorflow

Having trouble training

AmeetR opened this issue · 2 comments

Hi,

I'm trying to train on cityscapes in order to first replicate the 70% miou and then move to other driving datasets to see what happens. However, I'm having trouble replicating this. I can't seem to get the loss to converge below 1.7. I'm training from scratch on purpose in order to get a clean baseline for the other datasets.

Hey AmeetR

If you run through the history issues, you will found that we have discussed this problem several times.
Since this repository just convert the pre-trained weight from caffe original code to tensorflow version.
And the training code is just giving a try. If you want to replicate the performance, you need to implement the Synchronize BN Layer first in order to do large batch size training (as described in the paper).

Hi, @hellochick thanks for responding. I'll try to implement that layer tomorrow, but it looks like every time I try to increase the batch more than two my gpu runs out of memory. Also, I did look through all of the history issues and couldn't find anything much of use, which is why I made a new issue. That said, I'm now getting a loss of ~.25, but the evaluation is still .03. Any idea why this may be?