Tensorboard log is only updated when the model is evaluated

Question

Tensorboard log is only updated when the model is evaluated

adamtupper opened this issue 10 months ago · 0 comments

Bug

The Tensorboard logging is only performed every num_eval_iter (i.e. when the performance of the model is periodically evaluated on the validation set). This gives a very coarse view of the training behaviour and means that we cannot keep a closer eye on training metrics without also performing expensive evaluations on the validation set more frequently.

I can submit a pull request to modify the LoggingHook (see below) so that the Tensorboard logs are also updated every num_log_iter if there's agreement on this issue.

Reproduce the Bug

Perform any training run when num_eval_iter != num_log_iter and you'll see that the even the training metrics (losses, etc.) are only logged every num_eval_iter.

Error Messages and Logs

None

Proposed Fix

class LoggingHook(Hook):
    """
    Logging Hook for print information and log into tensorboard
    """
    def after_train_step(self, algorithm):
        """must be called after evaluation"""
        if self.every_n_iters(algorithm, algorithm.num_eval_iter):
            ...
            
            # Existing TB log update:
            if not algorithm.tb_log is None:
                algorithm.tb_log.update(algorithm.log_dict, algorithm.it)
        
        elif self.every_n_iters(algorithm, algorithm.num_log_iter):
            ....

            # FIX: Also update the logs here
            if not algorithm.tb_log is None:
                algorithm.tb_log.update(algorithm.log_dict, algorithm.it)