Question about the meaning of the training strategy
Opened this issue · 1 comments
Hi:
I am interested in your work and read the code carefully. In the training stage, you set 2 config paras: max_epoch and training_size.
https://github.com/p0werHu/articulated-objects-motion-prediction/blob/f6198bdc4041e1dd54f6367a8c54ddd016137fe1/src/config.py#L15
https://github.com/p0werHu/articulated-objects-motion-prediction/blob/f6198bdc4041e1dd54f6367a8c54ddd016137fe1/src/config.py#L16
Actually, in each circulation of the training_size, you load all the data with train_loader. it seems you define the iteration with the concept of epoch. (as the code below showes, it's in your file train.py)That means, you train the model for training_size * max_epoch times!! it's too much. and you store the checkpoint file only when one epoch ended which means the model has been trained for training_size times. if the best model occurs in a train time that is not the train_size's integral number of times, you may ignore it.
for epoch in range(config.max_epoch):
print("At epoch:{}".format(str(epoch + 1)))
prog = Progbar(target=config.training_size)
prog_valid = Progbar(target=config.validation_size)
# Train
#with torch.autograd.set_detect_anomaly(True):
for it in range(config.training_size):
for i, data in enumerate(train_loader, 0):
encoder_inputs = data['encoder_inputs'].float().to(device)
decoder_inputs = data['decoder_inputs'].float().to(device)
decoder_outputs = data['decoder_outputs'].float().to(device)
prediction = net(encoder_inputs, decoder_inputs, train=True)
loss = Loss(prediction, decoder_outputs, bone_length, config)
net.zero_grad()
loss.backward()
_ = torch.nn.utils.clip_grad_norm_(net.parameters(), 5)
optimizer.step()
so I want to know your purpose of adding the circulation of the training_size, maybe I don't understand the advantage of your setting.
Hi:
Thanks for your question and sorry for the late reply because I'm now sinking in another work and didn't notice this question. Actually, there isn't anything special about this setting. I recollected that this setting was adopted from another code of this work. In his work, he utilized TensorFlow, and in one config.training_size iteration, he only trained one batch. However, when it comes to Pytorch, it has a Class to load date. Thus, at that time I didn't think much about it and loaded all data within one it.
Of course, this is an imprecise way of writing it. I apologize for causing your confusion.
Sincerely,
junfeng