mashrurmorshed/Torch-KWT

Can we train custom keywords?

Closed this issue · 2 comments

Can we train custom keywords?

Yes, you should be able to, if you have the data.

If you go to the sample config, the very first section is like this:

data_root: ./data/
train_list_file: ./data/training_list.txt
val_list_file: ./data/validation_list.txt
test_list_file: ./data/testing_list.txt
label_map: ./data/label_map.json

First, you need to arrange your own data in the form of a class folder structure under some root folder.

./data/
├── keyword_a/
│   ├── a.wav
│   ├── b.wav
│   ├── ...
│   └── ...
├── keyword_b/
├── ...
├── ...
└── keyword_n/

You then need three .txt files to define your training, validation, and test sets. Each file is basically a list of the paths of the audio data.

Then you need to provide a label_map.json file, which contains the class_id to keyword class mapping. Something like:

{"0": "hello", "1": "world", "2": "ready", "3": "something"}

Inside the config you will use, remember to also update the number of classes. If you have for example 15 keywords, you have to set num_classes to 15.

Tricky part:

This repo was made with Speech Commands dataset. So it kind of expects audio keywords to be <= 1 second by default, and either pads or trims everything to 1 s.

If your audio clips are like that of Speech Commands in length, then there is no problem.

However, if you have keyword clips which are longer (e.g. ~2 s), or if you want to train with some specific audio clip length and spectrogram size, then I will have to make some small changes in the repository. (I can possibly get it done on the weekend if I have time, shouldn't be too time consuming.)

@ID56 Thankyou so much for the detailed answer got it!
Yes please if you find the time kindly update it if possible.
Once again thankyou for your reply.