hjf1997/articulated-objects-motion-prediction

Question about the meaning of the training strategy

Opened this issue · 1 comments

Hi:
I am interested in your work and read the code carefully. In the training stage, you set 2 config paras: max_epoch and training_size.
https://github.com/p0werHu/articulated-objects-motion-prediction/blob/f6198bdc4041e1dd54f6367a8c54ddd016137fe1/src/config.py#L15
https://github.com/p0werHu/articulated-objects-motion-prediction/blob/f6198bdc4041e1dd54f6367a8c54ddd016137fe1/src/config.py#L16
Actually, in each circulation of the training_size, you load all the data with train_loader. it seems you define the iteration with the concept of epoch. (as the code below showes, it's in your file train.py)That means, you train the model for training_size * max_epoch times!! it's too much. and you store the checkpoint file only when one epoch ended which means the model has been trained for training_size times. if the best model occurs in a train time that is not the train_size's integral number of times, you may ignore it.

 for epoch in range(config.max_epoch):
        print("At epoch:{}".format(str(epoch + 1)))
        prog = Progbar(target=config.training_size)
        prog_valid = Progbar(target=config.validation_size)

        # Train
        #with torch.autograd.set_detect_anomaly(True):
        for it in range(config.training_size):
            for i, data in enumerate(train_loader, 0):
                encoder_inputs = data['encoder_inputs'].float().to(device)
                decoder_inputs = data['decoder_inputs'].float().to(device)
                decoder_outputs = data['decoder_outputs'].float().to(device)
                prediction = net(encoder_inputs, decoder_inputs, train=True)
                loss = Loss(prediction, decoder_outputs, bone_length, config)
                net.zero_grad()
                loss.backward()
                _ = torch.nn.utils.clip_grad_norm_(net.parameters(), 5)
                optimizer.step()

so I want to know your purpose of adding the circulation of the training_size, maybe I don't understand the advantage of your setting.

Hi:

Thanks for your question and sorry for the late reply because I'm now sinking in another work and didn't notice this question. Actually, there isn't anything special about this setting. I recollected that this setting was adopted from another code of this work. In his work, he utilized TensorFlow, and in one config.training_size iteration, he only trained one batch. However, when it comes to Pytorch, it has a Class to load date. Thus, at that time I didn't think much about it and loaded all data within one it.

Of course, this is an imprecise way of writing it. I apologize for causing your confusion.

Sincerely,
junfeng