Having trouble training

Question

Having trouble training

AmeetR opened this issue 5 years ago · 2 comments

Hi,

I'm trying to train on cityscapes in order to first replicate the 70% miou and then move to other driving datasets to see what happens. However, I'm having trouble replicating this. I can't seem to get the loss to converge below 1.7. I'm training from scratch on purpose in order to get a clean baseline for the other datasets.

Answer 1 · 2019-06-03T08:59:13.000Z

Hey AmeetR

If you run through the history issues, you will found that we have discussed this problem several times.
Since this repository just convert the pre-trained weight from caffe original code to tensorflow version.
And the training code is just giving a try. If you want to replicate the performance, you need to implement the Synchronize BN Layer first in order to do large batch size training (as described in the paper).

Answer 2 · 2019-06-03T14:42:51.000Z

Hi, @hellochick thanks for responding. I'll try to implement that layer tomorrow, but it looks like every time I try to increase the batch more than two my gpu runs out of memory. Also, I did look through all of the history issues and couldn't find anything much of use, which is why I made a new issue. That said, I'm now getting a loss of ~.25, but the evaluation is still .03. Any idea why this may be?