The sound classification module provides the following functionality:
-
A machine learning module, based on the OpenCV implementation of the random forest algorithm, for training a sound classifier from a labeled set of recordings.
-
A classifier that can process a batch prerecorded audio files and output the resulting confusion matrix.
-
A naoqi module for online sound classification, using Nao's microphones as inputs.
Although this use-case technically does not require naoqi, due to legacy reasons it also uses the naoqi build system. It requires naoqi v2.1.
For reading/writing audio files:
sudo apt install libsndfile1-dev
For preprocessing audio files:
sudo apt install sox
For getting audio file properties in Python scripts:
sudo pip install soundfile
For extracting audio features:
git clone https://github.com/jamiebullock/LibXtract
cd LibXtract
sudo make install PREFIX=/usr/local
Clone this git repo into your qiworkspace. Make sure to select the host toolchain (SDK).
cd nao-sound-classification
qibuild configure
qibuild make
For convenience, add the path to the utility scripts to your PATH
environment variable, to have them accessible from any folder (append the above command to your ~/.bashrc
file):
export PATH=$PATH:<path to this repo>/scripts
Although technically not necessary, it is currently most convenient to create a Resources
folder inside <build folder>/sdk/bin
and copy the dataset there in a folder named Dataset (this layout is currently assumed by the default configuration files and utility scripts).
-
Prepare the audio files, i.e. convert them to 16kHz Mono format (parts of the feature extarctor are hardcoded to these values).
cd <build folder>/sdk/bin/Resources mkdir Dataset_mono_16k prepare_audio.sh
The prepare_audio.sh
script expects to be invoked from the folder where the dataset resides, it expects it to be in the Dataset
subfolder and it requires the Dataset_mono_16k
to have been created beforehand.
-
Create a list of files in the dataset and their corresponding labels
create_dataset_list.py Dataset_mono_16k
This will create output.csv
and output.txt
files, both files containing a list of files in the dataset with their coresponing labels, in different formats. The output.txt
is currently used by C++ tools.
-
Generate a summary of the dataset contents (number of samples and total duration for each class). Requires the list of dataset files and the folder where the dataset is located
transform_dataset_list.py output.txt Dataset_mono_16k
This script optionally allows dataset relabeling.
-
Split the dataset into training and test sampls
split_train_test_data.py output.txt class_info.txt
This will generate test_data.txt
and train_data.txt
.
-
Create and customize the classifier configuration file
cd .. # Go back to the bin folder where the executables are located ./Configurator SC.config
The resulting SC.config
configuration file is a line-oriented plain text file, which does not support comments. The layout must be as follows:
0. Sound library folder (where sound samples are located)
1. List of files for the training dataset (output of the split_train_test_data.py
script)
2. Feature selection string (used by Learner, fileClassify and folderClassify), each letter represents on feature to be used for classification. The feature k
is currently broken due to a bug in LibXtract and should not be used; Best results are currently obtained by using the lop
features
3. Sound features data (csv table generated by learner). Each row corrensponds to one chunk, columns correspond to feature values (several columns can belog to one feature - some features are vectors), only features selected by the feature selection string are present, always ordered as in the feature selection string
4. Learned classes list. The machine learning model encodes classes as numbers; this file (generated by the learner) provides the mapping between the numbers and the class names; class listed in the first row is encoded with 0, class in the second row is 1 etc
5. The classification model. An xml file generated by machine learning (Learner app), contains the OpenCV data structure encoding the machine learning model
6. Robot IP for connecting to the robot when classifying a live data stream
7. Robot port for connecting to the robot when classifying a live data stream
8. Sound sample length [pow(2, x)] - each recording is chopped up into pieces consisting of 2^(sound sample length) samples
9. subSample length [pow(2, x)] - each piece is chopped up into smaller piecesconsisting of 2^(subSample length) samples; currently used only by HZCRR (high zero crossing rate ratio)
-
Learn the model (random forest)
./Learner SC.config
If everything works correctly, this will generate the model.xml
file.
-
Validate the model
./FolderClassify SC.config
Line 2. in SC.config
names the dataset that will be used for validation. By default, this is the training data. To use test data, modify this line in SC.config
.
WARNING: These instructions are outdated (just a copy-paste of old, unverified instructions).
Initialize workspace. In empty folder
qibuild init
qitoolchain create pc ../../naoqi-skd-1.14.5-linux64/toolchain.xml
qitoolchain add-package -c pc XTRACT ../SoundClass_pc.tgz
qibuild configure -c pc (qibuild make -c pc)
Open project (CMakeLists.txt) in QtCreator (v5.x.x), specify build folder corrrectly!
For building 3rd party packages
Get Opennao VM (for building 3rd party packages and running them on the robot)
- Record (Listener)
-
needs SCModule to be running (provides sound filtering)
-
running SCModule remotely (on the PC)
./SCModule --pip 192.168.1.106 --pport 9559
./Listener <config_file_name>
records, stores files to "Sound library folder", automatically updates "Sound library list"
- Learn (Learner) - gradient boosted trees
-
koristi feature selection string; ako je ukljucena opcija, optimira (odbacuje "beskorisne" feature)
-
processes recordings in chunks/subchunks
./Learn <config_file_name>
- when the feature set is optimized, we should add another config line specifying which features are used
- Classify (autoClassify)
-
processes recordings in chunks/subchunks
-
needs SCModule to be running
./autoClassify dm.config
- Classify from file (fileClassify)
-
read audio file (currently hardcoded)
-
classify audio file without SCModule & NAO