Find best Trainer
Closed this issue · 0 comments
yuvalpinter commented
Since the upgrade to DyNet 2.0, training loss doesn't seem to converge on the Mimick algorithm (fine in tagger code; models also make sense).
This seems to be due to the change in learning rate behavior in DyNet's trainers. The current implementation here uses AdamTrainer, but SGDTrainer and AdaGradTrainer have the same issues.