Unexpected Training Time
jhkonan opened this issue · 0 comments
jhkonan commented
I am trying to get FullSubNet up and running by following the repo instructions. It seem we must make a custom train.toml, where we specify the relevant file paths and have text files with absolute paths. I am only looking at the training with no reverb.
I observe the following training time for one epoch on my system with two 2080 Ti GPUs.
This project contains 1 models, the number of the parameters is:
Network 1: 5.637635 million.
The amount of parameters in the project is 5.637635 million.
=============== 1 epoch ===============
[0 seconds] Begin training...
Saving 1 epoch model checkpoint...
[966 seconds] Training has finished, validation is in progress...
Saving 1 epoch model checkpoint...
😃 Found a best score in the 1 epoch, saving...
[1031 seconds] This epoch is finished.
This is much faster than I would expect a 5M parameter model to train for a dateset of this size. I am also not sure how to use the evaluation logs, since they seem to be in a proprietary format.
Could you tell us how long it takes to train a few epochs and what evaluation results we should expect early on?
Thank you for your help.