Step wise guidelines
Reethuch opened this issue · 6 comments
Hi team,
This is awesome. I want to recognize few set of key words from live audio and print them. I am planning to use my own training dataset. I don't understand what the flow is. What inputs to give(i mean arguments)? and what is the expected output.
The step wise guideline will come soon, but for now you can just refer to other examples to prepare your own data.
The following is the key steps:
- prepare a "wav.scp" file where each line contains the wav_id and its corresponding path, as shown below:
first.wav /path/to/first.wav
second.wav /path/to/second.wav
- prepare a "text" file where each line contains the wav_id and its corresponding label, assume that first.wav contains your keyword, and second.wav does not:
first.wav 0
second.wav -1
- use the script in wav_to_duration.sh to get the "wav.dur" file, and then run tools/make_list.py
bash tools/wav_to_duration.sh /path/to/wav.scp /path/to/wav.dur
python tools/make_list.py /path/to/wav.scp /path/to/text /path/to/wav.dur /path/to/data.list
Thank you @mlxu995. I generated model using google command speech dataset and your code. But how to use the model to recognize live words?
Also during stage 4,
I got some warnings like this, it that alright?
**_```
/Users/reethu/Desktop/wekws/wekws/model/mdtc.py:110: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert outputs.size(2) > self.padding
[W NNPACK.cpp:53] Could not initialize NNPACK! Reason: Unsupported hardware.
/Users/reethu/Desktop/wekws/wekws/model/mdtc.py:257: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if in_cache.size(0) > 0:
/Users/reethu/Desktop/wekws/wekws/model/mdtc.py:187: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if in_cache.size(0) > 0:
Export to onnx succeed, but pytorch/onnx have different
outputs when given the same input, please check!!!
For the last message, you can try to set the atol=1e-5 (in export_onnx.py). Note that the mdtc has a bigger error range than ds-tcn because it's finally output is a summation of the output of multi layers.
Other warnings can be just ignored.
To use the model to recognize live words, you can follow this guidelines (https://github.com/wenet-e2e/wekws/blob/main/runtime/android/README.md)
Thank you @mlxu995
I need to do audio detection from web browser.
Is that possible ?
Thank you @mlxu995 I need to do audio detection from web browser. Is that possible ?
@Reethuch You can try this web demo. https://www.modelscope.cn/studios/thuduj12/KWS_Nihao_Xiaojing/summary