AASHISHAG/deepspeech-german

Create a german model for DeepSpeech 0.6

SantosSi opened this issue ยท 26 comments

Create a german model for DeepSpeech 0.6

It would be really nice if you could provide a pretrained model that is compatible with DeepSpeech-0.6.0 that was released a few days ago. The authors state that models trained with older versions are not compatible.

I would do it on my own if I had enough computing power. ๐Ÿ˜”

Added in the todo list.

You call still use v0.5.0 as there is no major model architecture difference between these two versions, apart from performance and some added features.

Trying your v0.5.0 model with stock deepspeech 0.6.0 as provided through pip3 I get the following error message:

Not found: Op type not registered 'VariableV2' in binary running on MyMachine. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) tf.contrib.resampler should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
Traceback (most recent call last):
File "/home/user/.local/bin/deepspeech", line 10, in
sys.exit(main())
File "/home/user/.local/lib/python3.7/site-packages/deepspeech/client.py", line 113, in main
ds = Model(args.model, args.beam_width)
File "/home/user/.local/lib/python3.7/site-packages/deepspeech/init.py", line 42, in init
RuntimeError: CreateModel failed with error code 12294

You should install v0.5.0 of deepspeech using pip3.

v0.5.0 model won't be compatible with v0.6.0

Then I misunderstood your answer:

You call still use v0.5.0 as there is no major model architecture difference between these two versions

to @nanosonde when asking for a:

pretrained model that is compatible with DeepSpeech-0.6.0

You seem to relate to the software release version, not the model version. That was not completely clear.

A pretrained model compatible with DeepSpeech-0.6.0 would indeed be great.

I made a fork using the new DeepSpeech version. See: https://github.com/DanBmh/deepspeech-german

But for now only the voxforge dataset is working, i have some problems with tuda in validation step and did not test the other datasets (common-voice, swc, mailabs) yet.

@DanBmh I didn't face this issue with v0.5.0, but I had read about this error before, which you have mentioned in your ReadMe, and most of the people were able to resolve it by setting ignore_longer_outputs_than_inputs=True in DeepSpeech.py.

I did that already. Training and testing for tuda is working, but the validation after each epoch always breaks with a segmentation fault error after some steps.

@DanBmh thanks for the fork!
Would you also consider to upload a builded model to your fork?

Otherwise I will try to build it during the weekend on my own :-)

If I can choose I would love to test deppespeech with this german model: Tuda-De+Voxforge

Now i got tuda to work, but my results are much worse than in your paper.
With my best tuda only version i get WER: 0.41, with tuda+voxforge i only get WER: 0.65.

did you train tuda+voxforge with combined datasets or first tuda then voxforge?

@Mexxxo if i have good results, i will try to upload them. But i think i dont get them before the weekend. If you train it yourself please post the scores you got.

@DanBmh : Thank you for putting up the numbers.

We had split the training data into 10 sub-sets and then trained the model on each sub-set, repeatedly over 10 cycles. Example:
cycle 1: subset 1
cycle 2: subset 1 + subset 2
cycle 3: subset 1 + subset 2 + subset 3
so on ....

You might like to refer section 4.1 "Influence of Training Size" from our paper for details. It explains the training strategy.

We trained on Tuda-De (5 subsets) followed by Voxforge (5 subsets).

Regarding the WER numbers. The numbers depend on the data splits. Reshuffle the dataset and do a Train, Dev and Test split, you would surely get different WER. The numbers would be different for different splits. We are currently writing a paper to discuss these challenges and training a more robust model.

I now tried your stepped training with 10 steps for tuda (without voxforge) but this is not working for me somehow.
The network is not learning much and getting worse after cycle 5 (WER 0.684).

Also i found out there are many dataset errors in tuda, resulting in infinite loss if training from English checkpoint. Did you encounter this too? I solved it by excluding those files, but i think there are still some files with errors.

Any update on model compatible with DeepSpeech 0.6?

Also i found out there are many dataset errors in tuda, resulting in infinite loss if training from English checkpoint. Did you encounter this too? I solved it by excluding those files, but i think there are still some files with errors.

@DanBmh : I remember I did a check to find erroneous files for all the dataset:

  1. Used SOX to read all the files. This way I could remove the corrupted files. (I don't have the code handy now.)
  2. Checked for wav's length greater than the transcript. (I can't recollect if the code gave an erroneous file.)
    Code: find_erroneous_files.py

@photoszzt : We are currently writing a paper to discuss the challenges and training a more robust model using DeepSpeech.
We should be able to release the model with new datasets and updated Mozilla release till June/July.

Thanks for the code. I tested it but for me it does not filter any additional files.
Note that i filtered some files before using different metrics like the file length and chars per second rate. Without this prefiltering i get an Error after some time, i think this is because of some invalid/empty file. Tested on tuda train + dev dataset.

I now uploaded a checkpoint of one of my models. I did use the master branch of DeepSpeech, the version should be v0.7.0a2. It has a WER of 0.19, tested with Tuda + CommonVoice dataset. You can find the model files here: https://github.com/DanBmh/deepspeech-german#language-model-and-checkpoints

@AASHISHAG I did run a test with your uploaded checkpoints and my test dataset. I only reached 0.68 WER with both datasets and 0.79 with tuda only. From which training are your checkpoints, the Tuda-De+Voxforge+Mozilla run? I know my testset is larger and the data is not the same, but shouldn't the difference be smaller? You can find the full results and the instructions I used here.

@DanBmh This could because of 2 reasons:

  1. Dataset size. We used ~300h of data. When compared to ~600h you used, The model you trained learned better.
  2. The random data splits. We are aware of these issues and have submitted a paper on it. I will share it as soon it's accepted.

@SantosSi @nanosonde @LarsScha @photoszzt : We have released v0.6.0. You can find the link in the ReadMe.

@AASHISHAG thanks for the new version! do you have any performance data of the new model available yet?

@sebastiantilman : Unfortunately not, but this model should be more robust than the previous release, as it is trained on ~4 times the previous data.

hi,

I tested out the model performance (for version v0.6.0) on the test set of common voice dataset which amounted to about 18 hours of audio after data preparation. The WER I got was 31.65% and the CER was 16.05%.

The quality of transcriptions in general is much better than the previous version of the model (haven't evaluated WER/CER on the old model version), but I still feel that the WER for this model is too high.

To prepare the dataset, I downloaded the dataset from the common voice website and followed the instructions in your Readme.

@AASHISHAG Do the reported numbers make sense to you?

@DanBmh that you have a 0.7 models is awesome! But I can't find the files on your fork. The links at the bottom do not seem to work. Can you link the .pbmm and .scorer files? :)
PS: Could you open your fork for issues?

@erksch Updated the links. Only the checkpoints are not yet uploaded again.

@ALL
We would release version v0.7.0 soon.

I am closing this issue since it relates to old v0.6.0 release.