
Question about the meaning of the training strategy

Opened this issue · 1 comments

I am interested in your work and read the code carefully. In the training stage, you set 2 config paras: max_epoch and training_size.
Actually, in each circulation of the training_size, you load all the data with train_loader. it seems you define the iteration with the concept of epoch. (as the code below showes, it's in your file train.py)That means, you train the model for training_size * max_epoch times!! it's too much. and you store the checkpoint file only when one epoch ended which means the model has been trained for training_size times. if the best model occurs in a train time that is not the train_size's integral number of times, you may ignore it.

 for epoch in range(config.max_epoch):
        print("At epoch:{}".format(str(epoch + 1)))
        prog = Progbar(target=config.training_size)
        prog_valid = Progbar(target=config.validation_size)

        # Train
        #with torch.autograd.set_detect_anomaly(True):
        for it in range(config.training_size):
            for i, data in enumerate(train_loader, 0):
                encoder_inputs = data['encoder_inputs'].float().to(device)
                decoder_inputs = data['decoder_inputs'].float().to(device)
                decoder_outputs = data['decoder_outputs'].float().to(device)
                prediction = net(encoder_inputs, decoder_inputs, train=True)
                loss = Loss(prediction, decoder_outputs, bone_length, config)
                _ = torch.nn.utils.clip_grad_norm_(net.parameters(), 5)

so I want to know your purpose of adding the circulation of the training_size, maybe I don't understand the advantage of your setting.


Thanks for your question and sorry for the late reply because I'm now sinking in another work and didn't notice this question. Actually, there isn't anything special about this setting. I recollected that this setting was adopted from another code of this work. In his work, he utilized TensorFlow, and in one config.training_size iteration, he only trained one batch. However, when it comes to Pytorch, it has a Class to load date. Thus, at that time I didn't think much about it and loaded all data within one it.

Of course, this is an imprecise way of writing it. I apologize for causing your confusion.
