athena-team/athena

How can I get vocab.txt and feats.txt

nan9126 opened this issue · 12 comments

hi
recently,I trained the model following your readme by /athena/examples/asr/aishell/run.sh and deployed the c++ demo. After freezing pb , I regard the vocab in "/examples/asr/aishell/data/" as vocab.txt and obtained the feats from training process ,then I run c++ demo and motified the batchsize same as the feats from training, but still got a wrong result,.
so I wanna know how can I get vocab.txt and feats.txt .
Looking forward to your reply.

The way that you getting vocab.txt and feats.txt is correct. Try using batch size 1 and normalized feats.

The way that you getting vocab.txt and feats.txt is correct. Try using batch size 1 and normalized feats.
Thanks for your replay
The features i get is like "-1.60294092e+00 -1.29458332e+00 -1.66426182e+00 -1.82421780e+00 -1.70533729e+00......",so how to normalize it .
if I want to use other wav and how to get the features.

Here is an example:

    from athena.transform import AudioFeaturizer
    from athena.data import FeatureNormalizer

    path = "data/a.wav"
    audio_config =  {"type":"Fbank", "filterbank_channel_count":40}
    cmvn_file = "examples/asr/aishell/data/cmvn"
    audio_featurizer = AudioFeaturizer(audio_config)
    feature_normalizer = FeatureNormalizer(cmvn_file)
    feat = audio_featurizer(path)
    feat = feature_normalizer(feat, 'speaker')

You need to specify the wav path and 'speaker' to be the correct ones. By running FeatureNormalizer, you can get normalized features.

Here is an example:

    from athena.transform import AudioFeaturizer
    from athena.data import FeatureNormalizer

    path = "data/a.wav"
    audio_config =  {"type":"Fbank", "filterbank_channel_count":40}
    cmvn_file = "examples/asr/aishell/data/cmvn"
    audio_featurizer = AudioFeaturizer(audio_config)
    feature_normalizer = FeatureNormalizer(cmvn_file)
    feat = audio_featurizer(path)
    feat = feature_normalizer(feat, 'speaker')

You need to specify the wav path and 'speaker' to be the correct ones. By running FeatureNormalizer, you can get normalized features.

I tried the example and run the c++ demo, still got the wrong result:
"Argmax decoding results: 左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左左塑塑塑塑塑塑塑塑塑塑塑塑塑塑塑塑塑塑塑臻左左臻左臻左臻左塑臻左塑臻左塑臻左塑臻左塑臻
"
May wrong with my pb? Is there any way to locate the error?

First, please make sure your training process is properly finished, i.e. your model achieved CER around 6.6%. Then, check whether the model is properly loaded when generating pb, and check whether any error messages were print out during this stage. Also, I suggest that you use a wav from aishell and check the decode result again to make sure this problem is not caused by the mismatch between your training set (aishell) and test wav (a wav that is out-of-domain).

First, please make sure your training process is properly finished, i.e. your model achieved CER around 6.6%. Then, check whether the model is properly loaded when generating pb, and check whether any error messages were print out during this stage. Also, I suggest that you use a wav from aishell and check the decode result again to make sure this problem is not caused by the mismatch between your training set (aishell) and test wav (a wav that is out-of-domain).

I do use the wav from aishell test set , and I will check my training process and generating pb process , thanks a lot~(星星眼)

You are welcome. Feel free to reply if you have any further questions.

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale commented

This issue is closed. You can also re-open it if needed.

hi, I trained the model again and got the score like

| Sum/Avg | 7128 103999 | 93.7 5.6 0.7 0.2 6.5 41.3|
|======================================|
| Mean | 7.1 104.1 | 93.9 5.7 0.4 0.2 6.3 37.8 |
| S.D. | 4.9 79.5 | 7.5 7.1 2.0 1.4 7.7 32.5 |
| Median | 11.0 129.0 | 95.1 4.6 0.0 0.0 4.9 36.4 |

then I checked the inference.log.label and inference.log.result , seems like the result is correct, but I uesd the example you give to me to extract audio feature and use c++ deploy demo, there is no Argmax decoding results.
if is there any way to locate problem?

hi, I trained the model again and got the score like

| Sum/Avg | 7128 103999 | 93.7 5.6 0.7 0.2 6.5 41.3|
|======================================|
| Mean | 7.1 104.1 | 93.9 5.7 0.4 0.2 6.3 37.8 |
| S.D. | 4.9 79.5 | 7.5 7.1 2.0 1.4 7.7 32.5 |
| Median | 11.0 129.0 | 95.1 4.6 0.0 0.0 4.9 36.4 |

then I checked the inference.log.label and inference.log.result , seems like the result is correct, but I uesd the example you give to me to extract audio feature and use c++ deploy demo, there is no Argmax decoding results.
if is there any way to locate problem?

  1. please check whether your directory structure is similar to this:
    image
    decoder.pb encoder.pb feats.txt vocab.txt These four files are must need.
  2. you can check whether these files are correctly loaded by printing them out and check the model status in CPP, e.g. check index_to_char, enc_inputs[0], status.
  3. then, you can check is there any predictions being saved to completed_seqs. If not or the predictions are wrong, you can check whether the enc_outputs is correct by comparing it with the python program's encoder output firstly.

Please reopen this issue if the problem still exists.