Warning: Training beyond specified 't_total'. Learning rate multiplier set to 0.0. Please set 't_ total' of WarmupLinearSchedule correctly.
deepbluefantasy opened this issue · 4 comments
Hello! I tried to translate the dataset Persona-Chat into Chinese so that I can complete a Chinese persona dialogue generation task.
I have translated all the datasets. But when I am training the transimiter module, something unexpected happend. In the logs, it shows "Training beyond specified 't_total'. Learning rate multiplier set to 0.0. Please set 't_total' of WarmupLinearSchedule correctly."
I guess it is about the args 'n_epoches' and 'num_train_epochs'. Can you explain the role of these two parameters?
After debugging, I found the 't_total' actually comes from following code from agents/transmitter/transmitter.py
if 'gpt' in ARCH_CHOICE:
num_optim_steps = opt['train_size'] * opt['num_train_epochs'] // opt['batchsize']
# override optimizer_step
opt['optimizer_step'] = num_optim_steps
And I also notice in train_transmitter.py
you write this:
num_epochs=num_train_epochs,
I don't know if there is something wrong.
@deepbluefantasy Hi, thanks for your interest on our work! Since the code is released two years ago, I cannot recall the details about these two parameters. You may paste the code snippet and ask for ChatGPT's help on explaining it?
As for your question, I think it is reasonable. If I remember it correctly, the train_size
means the total number examples of the train set, and the optimization step should be train_size x train_epochs / batch_size
.
@SivilTaram Thanks for your reply!
Can you recall the time or epochs you use to train the model? I find n_epoches=100
while num_train_epochs=4
which is not matched. If training epochs is ok with less than 10 epochs, I think the setting is alright.
And I am also training the receiver. The default epochs is 50. But it stops at 24 epochs because of max_train_time elapsed:60000.52600765228s
. I wonder whether it is enough with 24 training epochs?
BTW, I have sent you an email. You can check for it :)
@deepbluefantasy I think the n_epoches
is used for the optimizer (if not set, the optimizer may not start to increase the learning rate because of the default learning rate schedule), and the num_train_epochs
is in fact the steps for the whole training. For the receiver, I think 24
training epochs is enough.
@SivilTaram Thank you:)