mozilla/deepspeech-playbook

Provide some GPUs recommandations / hints

Opened this issue · 2 comments

It would be useful to help people scoping the requirements in term of GPUs to properly set expectations:

  • how much VRAM required for training from scratch
  • how much VRAM for transfer learning
  • some ratio of GPU model / audio volume / training time

Some data point on 2xRTX2080Ti 11GB VRAM

  • trains from scratch flawlessly
  • transfer learning
  • training of ~1300h of audio data:
    • batch size 64
    • 1h per epoch

Tesla k40m 11GB VRAM:

  • Transfer learning ... for 10 hours training data approx. 20 mins per epoch
  • Batch size 8