AASHISHAG/deepspeech-german

Infos about new 0.7.4 model

Closed this issue · 8 comments

Thanks for sharing your model.


I have some questions I would like to ask:

  1. Could you please give us some infos about the model performance (test loss/WER, test datasets)?
  2. Would it be possible to upload the checkpoint files too?
  3. Did you train without augmentations (I couldn't see them in the flags.txt file) and is there a reason why?
  4. Why did you change your alphabet to include numbers this time? Did it improve performance?
  1. Could you please give us some infos about the model performance (test loss/WER, test datasets)?

The test dataset was random split of Mozilla, Mailabs, Tuda and Voxforge. The WER ranged from 10-20% on individual test dataset.

  1. Would it be possible to upload the checkpoint files too?

Sure, in a few days.

  1. Did you train without augmentations (I couldn't see them in the flags.txt file) and is there a reason why?

We experimented augmentation for Swiss-German but there was not much improvment. Therefore, we didn't use augmentation,

  1. Why did you change your alphabet to include numbers this time? Did it improve performance?

There was no such good word2num for German. Therefore, in other experiments, we tried using numbers directly, but no such improvment.

The test dataset was random split of Mozilla, Mailabs, Tuda and Voxforge. The WER ranged from 10-20% on individual test dataset.

Didn't you plan to use the predefined splits for Tuda? I'm not sure if the same problem with the random splits exists for CommonVoice too, I couldn't measure them in my tests when excluding duplicate files but there was a difference in training results after I changed to predefined splits there too (but I did change other things too so I can't say for sure).

We experimented augmentation for Swiss-German but there was not much improvment. Therefore, we didn't use augmentation,

Did you try it with v0.7.3 or earlier? I have the feeling that augmentations don't work well in later versions. I did use https://github.com/DanBmh/DeepSpeech/tree/before_new_augs2 for my latest trainings (has improved reduce lr on plateau and transfer learning merged)

There was no such good word2num for German. Therefore, in other experiments, we tried using numbers directly, but no such improvment.

In my Voice Assistant project I'm using Duckling for this. My implementation is here, but it might be easier if you use the http interface directly.

1. Would it be possible to upload the checkpoint files too? 

Sure, in a few days.

Also from my side: Thank you very much for your work.
Could you already upload the checkpoint files? Where can I find them?

@DanBmh : I missed your comment. Thank you for the useful insights.

Didn't you plan to use the predefined splits for Tuda? I'm not sure if the same problem with the random splits exists for CommonVoice too, I couldn't measure them in my tests when excluding duplicate files but there was a difference in training results after I changed to predefined splits there too (but I did change other things too so I can't say for sure).

I used the predefined splits for Tuda-De, but not for MCV. The reason is that we lose a lot of data when we use MCV splits as it doesn't include duplicates. The WER would differ for each test set, therefore, I didn't quote WER.

Did you try it with v0.7.3 or earlier? I have the feeling that augmentations don't work well in later versions. I did use https://github.com/DanBmh/DeepSpeech/tree/before_new_augs2 for my latest trainings (has improved reduce lr on plateau and transfer learning merged)

No, I didn't. I would give it a try sometime.

In my Voice Assistant project I'm using Duckling for this. My implementation is here, but it might be easier if you use the http interface directly.

Thank you for the link on your Voice Assistant Project.

@DanBmh @buergeb : The old checkpoints were deleted during a clean-up. I would train a v0.8.0 and would surely release the checkpoints (most probably in two weeks).

I used the predefined splits for Tuda-De, but not for MCV. The reason is that we lose a lot of data when we use MCV splits as it doesn't include duplicates.

For CommonVoice I'm using the predefined test+dev split, and all the other files for training, so you wouldn't lose data this way. If you have a csv file with all audio files (audiomate creates them for example), you can use this script for splitting: https://gitlab.com/Jaco-Assistant/deepspeech-polyglot/-/blob/master/preprocessing/split_dataset.py#L57

With the Tuda Dataset, did you exclude the Realtek recordings from the testset? I did learn only a short time ago that these recordings are not included in the official splits and the other paper results. This improves performance greatly.

The WER would differ for each test set, therefore, I didn't quote WER.

Could you do this please for your next training? I would be interested how my model performs against yours, we might learn something from that. Currently I'm providing scores for the official Tuda and CommonVoice testsets. You can find the model here: https://gitlab.com/Jaco-Assistant/deepspeech-polyglot/-/tree/master#language-models-and-checkpoints

@DanBmh : How do you want me to configure the splits? I have below files from MCV corpus:

dev.csv other.csv test.csv train-all.csv train.csv validated.csv

I did take the original dev.tsv and test.tsv from the CommonVoice dataset and used them to split my all.csv file (might be train-all.csv or validated.csv in your case), so that all of the audiofiles in dev.tsv and test.tsv are in my new dev.csv and test.csv and the rest is put into my train.csv list. in You can use the script linked above for this.

Released v0.9.0. Closing the ticket.