Modularize and clean up StoppableTrainer

Question

mdraw opened this issue 7 years ago · 0 comments

Split StoppableTrainer.train() into multiple functions.
Progress reporting to terminal and logging to TensorBoard should be less entangled with the actual training
The current training loop implementation is memory-inefficient: some tensors are kept alive for too long for logging purposes. They may take away valuable GPU memory. To prevent OOM crashes during validation and preview predictions, it should be rewritten to free resources ASAP when they are no longer needed.