AASHISHAG/deepspeech-german

About dataset duration for v0.6.0

Closed this issue · 2 comments

For the v0.6.0 model, you mention the duration of TUDA-de dataset to be 184 hours and Voxforge dataset to be 57 hours.

Having checked the links for both, it seems that Voxforge has about 35 hours of data and TUDA-de seems to be about the same.
I'm using the links provided in the README:

  1. Voxforge: http://www.voxforge.org/home/forums/other-languages/german/open-speech-data-corpus-for-german
  2. TUDA-de: https://www.inf.uni-hamburg.de/en/inst/ab/lt/resources/data/acoustic-models.html

Do I have the wrong links somehow?

Yes, the exact current dataset size differs from what is mentioned in the above references.

Thanks for the prompt reply.