chenmingxiang110/Chinese-automatic-speech-recognition

How to perform inference?

Aksh97 opened this issue · 7 comments

Hi, I am bit confused on how to perform inference with the help of your pretrained model?
Can you please provide the steps?

Take model 903 as an example.

from model903 import model
af = AudioFeaturizer()
model = model(409)

Then just simply load the pretrained model

sess = tf.Session()
saver = tf.train.Saver()
saver.restore(sess, "path/to/ckpt")

Now you are ready to go. Read an audio file and you can get their pinyins

rate, data = read_wav("example.wav")
data = mergeChannels(data)
data = zero_padding_1d(data, 160240)
a_seg = AudioSegment(data, rate)
xs = np.transpose(np.array([af.featurize(a_seg)]), [0,2,1])
pred = model.predict(sess, xs)[0]

Please read the readme file carefully. It has already got most of the things you need. And by the way you can also check the "subtitle_demo.py" for some code examples.

Hi, Thanks.

I have two queries:
This whole code is written on tf 1.x is there any plans to convert it to tf 2.x, because it shows multiple errors because of it.

Also, do you have pretrained model (Then just simply load the pretrained model)?

Sorry Aksh97, the model itself (deep speech) is quite an old algorithm (probably proposed 5 or 6 years ago). So I do not have a plan to re-write this project. If you are interested in audio recognition algorithms written in tf 2.x or torch maybe, you can definitely find some new algorithms such as DFSMN. I haven't followed the latest research for a few years, so maybe this recommendation is also out-of-date. If you are interested in building your own speech recognition algorithm on embedded devices (or PC), you can probably check this project: https://github.com/sipeed/Maix-Speech.

As for the second question, the pretrained models can be downloaded from baidu netdisk:
model 903: https://pan.baidu.com/s/1NcTN8gojuIBaIFT9FB3EJw Code: 261u
model 902: https://pan.baidu.com/s/1do7C6Egj6sJO7kn1yHPzBg Code: 9o87
model 901: https://pan.baidu.com/s/1utz-1Vv4IO9D-3awj3x1QQ Code: pv08

Thanks for the quick and prompt response. Appreciate it.

Also, thanks for providing the link to Maix-Speech and pretrained models.

Hi, @chenmingxiang110, do you know any other ASR project for Chinese, with good accuracy?

I checked out : https://github.com/sipeed/Maix-Speech.
But this is mainly for real time, what I am looking for is that I pass audio file(mp3 or wav ) and it returns text.

Any help will be appreciated.

Sorry I haven't followed the news for years. Probably searching for some latest papers in the field of ASR will lead you to some good open-source projects.

Okay sure. Thanks a lot