Find best Trainer

Question

Find best Trainer

Closed this issue 7 years ago · 0 comments

Since the upgrade to DyNet 2.0, training loss doesn't seem to converge on the Mimick algorithm (fine in tagger code; models also make sense).

This seems to be due to the change in learning rate behavior in DyNet's trainers. The current implementation here uses AdamTrainer, but SGDTrainer and AdaGradTrainer have the same issues.