yuvalpinter/Mimick

Find best Trainer

Closed this issue · 0 comments

Since the upgrade to DyNet 2.0, training loss doesn't seem to converge on the Mimick algorithm (fine in tagger code; models also make sense).

This seems to be due to the change in learning rate behavior in DyNet's trainers. The current implementation here uses AdamTrainer, but SGDTrainer and AdaGradTrainer have the same issues.