tensorflow/text

text/docs/tutorials/fine_tune_bert.ipynb : BERT not learning

sungengyi opened this issue · 5 comments

warmup_schedule = tfm.optimization.lr_schedule.LinearWarmup( warmup_learning_rate = 0, after_warmup_lr_sched = linear_decay, warmup_steps = warmup_steps) ... ... optimizer = tf.keras.optimizers.experimental.AdamW( lr = warmup_schedule, weight_decay=0.01, )

Training the BERT classifier with this optimizer, the model won't learn anything. I changed the optimizer back to the one that was used in the previous code:

optimizer = nlp.optimization.create_optimizer(2e-5, num_train_steps=num_train_steps, num_warmup_steps=warmup_steps)

I guess the problem is there might be a typo in the warmup_learning_rate=0 in the above code, but I haven't tried rerunning the notebook with that being changed.

Thanks for pointing this out. It looks like nobody has commented yet, but it is being looked at.

Hi,

I'm fixing this right now.

I'm pretty sure the culprit is this line, which is broken in 2.9 but is fixed in master:

https://github.com/keras-team/keras/blob/r2.9/keras/optimizers/optimizer_experimental/adamw.py#L173

I'm just going to switch to standard Adam, and drop the weight decay.

@MarkDaoust
Hi!

Thank you! I used code from the previous tutorial (tf-models-official==2.4.0) and trained a 90+% accuracy classifier. Due to some keras updates, I couldn't use the old version of the code anymore. However, with the same hyperparams, I couldn't reproduce the same performance. I'm not an expert in TensorFlow so I was wondering what might be the problem... Do you have any ideas? :)

I've made some progress on the fix. Working on getting it submitted this week.

I moved this to tensorflow/models, so it can be with its friends. Fixed it at the same time.

I dropped the AdamW optimizer in this version. Since the Keras version is buggy in TF2.9.