SivilTaram/Persona-Dialogue-Generation

Warning: Training beyond specified 't_total'. Learning rate multiplier set to 0.0. Please set 't_ total' of WarmupLinearSchedule correctly.

deepbluefantasy opened this issue · 4 comments

Hello! I tried to translate the dataset Persona-Chat into Chinese so that I can complete a Chinese persona dialogue generation task.

I have translated all the datasets. But when I am training the transimiter module, something unexpected happend. In the logs, it shows "Training beyond specified 't_total'. Learning rate multiplier set to 0.0. Please set 't_total' of WarmupLinearSchedule correctly."

I guess it is about the args 'n_epoches' and 'num_train_epochs'. Can you explain the role of these two parameters?

After debugging, I found the 't_total' actually comes from following code from agents/transmitter/transmitter.py

if 'gpt' in ARCH_CHOICE:
    num_optim_steps = opt['train_size'] * opt['num_train_epochs'] // opt['batchsize']
    # override optimizer_step
    opt['optimizer_step'] = num_optim_steps

And I also notice in train_transmitter.py you write this:

num_epochs=num_train_epochs,

I don't know if there is something wrong.

@deepbluefantasy Hi, thanks for your interest on our work! Since the code is released two years ago, I cannot recall the details about these two parameters. You may paste the code snippet and ask for ChatGPT's help on explaining it?

As for your question, I think it is reasonable. If I remember it correctly, the train_size means the total number examples of the train set, and the optimization step should be train_size x train_epochs / batch_size.

@SivilTaram Thanks for your reply!

Can you recall the time or epochs you use to train the model? I find n_epoches=100 while num_train_epochs=4 which is not matched. If training epochs is ok with less than 10 epochs, I think the setting is alright.

And I am also training the receiver. The default epochs is 50. But it stops at 24 epochs because of max_train_time elapsed:60000.52600765228s . I wonder whether it is enough with 24 training epochs?

BTW, I have sent you an email. You can check for it :)

@deepbluefantasy I think the n_epoches is used for the optimizer (if not set, the optimizer may not start to increase the learning rate because of the default learning rate schedule), and the num_train_epochs is in fact the steps for the whole training. For the receiver, I think 24 training epochs is enough.