xiph/rnnoise

How to achieve a good time per epoch?

Zadagu opened this issue · 0 comments

A Question to the community: On which Hardware do you achieve the best time per Epoch during training?

I tried following machines (Tensorflow 2.1 as backend)

  • i7-7700K CPU
    64GB RAM
    -> ~1200s / 20min per Epoch
  • dual Xeon CPU E5-2620 v3 @ 2.40GHz
    42GB RAM
    -> ~4800s / 80min per Epoch (custom build tensorflow to use all CPU features, but that didn't increased performance a lot)
  • AWS ml.p2.xlarge
    Nvidia K80
    61GB RAM
    -> ~ 1892s per Epoch (Tensorflow 2.1, custom docker image based on tensorflow:2.1.0-gpu-py3)
  • AWS ml.m5.4xlarge
    16 vCPUs
    64GB RAM
    -> ~7000s per Epoch (Tensorflow 1.13, original sagemaker docker image)
  • AWS ml.c5.9xlarge
    36 vCPUs
    72GB RAM
    -> ~7000s per Epoch (Tensorflow 1.13, original sagemaker docker image)

To enable full CuDNN support I also created a model with only tanh as recurrent activation.
This decreased the time per epoch (ml.p2.xlarge -> ~420s), but the trained model performed far less.

Do I miss anything? Is there a way to accelerate the training speed?