ELEKTRONN/elektronn3

Modularize and clean up StoppableTrainer

mdraw opened this issue · 0 comments

mdraw commented
  • Split StoppableTrainer.train() into multiple functions.
  • Progress reporting to terminal and logging to TensorBoard should be less entangled with the actual training
  • The current training loop implementation is memory-inefficient: some tensors are kept alive for too long for logging purposes. They may take away valuable GPU memory. To prevent OOM crashes during validation and preview predictions, it should be rewritten to free resources ASAP when they are no longer needed.