Can we train custom keywords?
Closed this issue · 2 comments
Yes, you should be able to, if you have the data.
If you go to the sample config, the very first section is like this:
data_root: ./data/
train_list_file: ./data/training_list.txt
val_list_file: ./data/validation_list.txt
test_list_file: ./data/testing_list.txt
label_map: ./data/label_map.json
First, you need to arrange your own data in the form of a class folder structure under some root folder.
./data/
├── keyword_a/
│ ├── a.wav
│ ├── b.wav
│ ├── ...
│ └── ...
├── keyword_b/
├── ...
├── ...
└── keyword_n/
You then need three .txt files to define your training, validation, and test sets. Each file is basically a list of the paths of the audio data.
Then you need to provide a label_map.json file, which contains the class_id to keyword class mapping. Something like:
{"0": "hello", "1": "world", "2": "ready", "3": "something"}
Inside the config you will use, remember to also update the number of classes. If you have for example 15 keywords, you have to set num_classes
to 15.
Tricky part:
This repo was made with Speech Commands dataset. So it kind of expects audio keywords to be <= 1 second by default, and either pads or trims everything to 1 s.
If your audio clips are like that of Speech Commands in length, then there is no problem.
However, if you have keyword clips which are longer (e.g. ~2 s), or if you want to train with some specific audio clip length and spectrogram size, then I will have to make some small changes in the repository. (I can possibly get it done on the weekend if I have time, shouldn't be too time consuming.)
@ID56 Thankyou so much for the detailed answer got it!
Yes please if you find the time kindly update it if possible.
Once again thankyou for your reply.