Step wise guidelines

Question

Step wise guidelines

Reethuch opened this issue 2 years ago · 6 comments

Hi team,
This is awesome. I want to recognize few set of key words from live audio and print them. I am planning to use my own training dataset. I don't understand what the flow is. What inputs to give(i mean arguments)? and what is the expected output.

Answer 1 · 2023-03-11T05:45:22.000Z

The step wise guideline will come soon, but for now you can just refer to other examples to prepare your own data.
The following is the key steps:

prepare a "wav.scp" file where each line contains the wav_id and its corresponding path, as shown below:

first.wav /path/to/first.wav
second.wav /path/to/second.wav

prepare a "text" file where each line contains the wav_id and its corresponding label, assume that first.wav contains your keyword, and second.wav does not:

first.wav 0
second.wav -1

use the script in wav_to_duration.sh to get the "wav.dur" file, and then run tools/make_list.py

bash tools/wav_to_duration.sh /path/to/wav.scp /path/to/wav.dur
python tools/make_list.py /path/to/wav.scp /path/to/text /path/to/wav.dur /path/to/data.list

Answer 2 · 2023-03-16T23:27:17.000Z

Thank you @mlxu995. I generated model using google command speech dataset and your code. But how to use the model to recognize live words?

Also during stage 4,
I got some warnings like this, it that alright?
**_```
/Users/reethu/Desktop/wekws/wekws/model/mdtc.py:110: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert outputs.size(2) > self.padding
[W NNPACK.cpp:53] Could not initialize NNPACK! Reason: Unsupported hardware.
/Users/reethu/Desktop/wekws/wekws/model/mdtc.py:257: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if in_cache.size(0) > 0:
/Users/reethu/Desktop/wekws/wekws/model/mdtc.py:187: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if in_cache.size(0) > 0:
Export to onnx succeed, but pytorch/onnx have different
outputs when given the same input, please check!!!

Answer 3 · 2023-03-19T14:34:47.000Z

For the last message, you can try to set the atol=1e-5 (in export_onnx.py). Note that the mdtc has a bigger error range than ds-tcn because it's finally output is a summation of the output of multi layers.
Other warnings can be just ignored.

Answer 4 · 2023-03-19T14:39:37.000Z

To use the model to recognize live words, you can follow this guidelines (https://github.com/wenet-e2e/wekws/blob/main/runtime/android/README.md)

Answer 5 · 2023-03-20T20:53:43.000Z

Thank you @mlxu995
I need to do audio detection from web browser.
Is that possible ?

Answer 6 · 2023-09-25T09:41:42.000Z

Thank you @mlxu995 I need to do audio detection from web browser. Is that possible ?

@Reethuch You can try this web demo. https://www.modelscope.cn/studios/thuduj12/KWS_Nihao_Xiaojing/summary