Modularize and clean up StoppableTrainer
mdraw opened this issue · 0 comments
mdraw commented
- Split
StoppableTrainer.train()
into multiple functions. - Progress reporting to terminal and logging to TensorBoard should be less entangled with the actual training
- The current training loop implementation is memory-inefficient: some tensors are kept alive for too long for logging purposes. They may take away valuable GPU memory. To prevent OOM crashes during validation and preview predictions, it should be rewritten to free resources ASAP when they are no longer needed.