Checkpoint cleanup for simultaneous experiments
jdunnmon opened this issue · 1 comments
jdunnmon commented
Currently, if multiple experiments are running in parallel, they all use the same checkpoints
directory by default. This is a problem, because they are then all overwriting the same best_model.pth
asynchronously, which can cause experiment A to load experiment B's checkpoint.
We should move the checkpoints for a given experiment to its log folder, and then clean them up by default when the run is complete.
bhancock8 commented
Resolved in v0.5, I believe.