gpauloski/kfac-pytorch

Investigate time-per-epoch slow down over course of training

Closed this issue · 1 comments

Possibly related to I/O? Need to profile training.

Update: this only occurs on experimental when using Horovod for communication. Possibly related to the Horovod install?