Loading the optimizer before training length adaptive
shira-g opened this issue · 0 comments
Hello,
I am trying to apply your method on my own model (trained using a different code) instead of bert-base, and I got a good f1 result using your code to train the model with length-adaptive.
However, I noticed that in your code you are loading the saved optimizer and scheduler states of the provided model, and since I couldn't think of a reason to load the optimizer of a fine-tuned model before training it with length adaptive, I removed the saved optimizer, and actually got a lower result.
I wonder if there is an explanation to this behaviour which you are aware of- since I also noticed you commented-out the lines in your code which saves the optimizer (but still the code loads a pre-saved optimizer) and I wonder if that was intentional.
Regards and thank you,
Shira