Audio Classification

Dependency Setup

Create new conda virtual environment

conda create --name audio_classify python=3.7 -y
conda activate audio_classify

Installation

conda install pytorch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0 -c pytorch -y
git clone https://github.com/chingi071/Audio_Classification
pip install -r requirements.txt

Dataset Preparation

Open source audio dataset

Tomofun-AI 狗音辨識: https://github.com/lawrencechen0921/Tomofun-AI-

Kaggle Audio Cats and Dogs: https://www.kaggle.com/mmoreaux/audio-cats-and-dogs

Kaggle Freesound General-Purpose Audio Tagging Challenge: https://www.kaggle.com/c/freesound-audio-tagging/data

Data Preprocessing

If you want to try your dataset, please prepare the following items.

The training/ validation dataset file
The data label csv
The dataset yaml

Take the Kaggle Audio Cats and Dogs dataset as an example, please place the dataset in different folders according to the category.

Next, create the data label csv using the following ipynb file.

create_data_csv.ipynb

Third, create the dataset yaml.

cat_dog.yaml

Take the Tomofun-AI dataset as an example, please do data preprocessing. You will get tomofun_train.csv.

Tomofun_data_preprocessing.ipynb

And then create the dataset yaml.

The Tomofun-AI dataset structure is as follows:

train
├── train_00001.wav
├── train_00002.wav
├── ...
└── train_01200.wav
tomofun_train.csv

Data Augmentation

We use Audiomentations to add more data.

data_augmentation.ipynb

The dataset structure is as follows:

tomofun_aug_train
├── aug_0_train_00001.wav
├── aug_0_train_00002.wav
├── ...
├── train_00001.wav
├── train_00002.wav
├── ...
└── train_01200.wav
tomofun_aug_train.csv

Data Visualize

data_visualize.ipynb

Training

The model you can choose: ResNet18、ResNet34、ResNet50、ResNet101、ResNet152、SENet、DenseNet、Convnext_tiny、Convnext_small、Convnext_base、Convnext_large

Train on one GPU

python train.py --yaml_file=tomofun.yaml --model=ResNet18 --model_saved_path=workdirs

Train on multi-GPU

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 train.py --yaml_file=tomofun.yaml --model=Convnext_tiny --model_saved_path=workdirs

To enable one more multi-GPU training, use the following command.

CUDA_VISIBLE_DEVICES=2,3 python -m torch.distributed.launch --nproc_per_node=2 --master_port 9999 train.py --yaml_file=tomofun.yaml --model=Convnext_tiny --model_saved_path=workdirs

Start TensorBoard

tensorboard --logdir runs

Predict

python predict.py --yaml_file=tomofun.yaml --model=Convnext_tiny --model_saved_path=workdirs --test_data=test_data

Convert to ONNX

pip install onnx onnxruntime==1.6.0

python convert_to_onnx.py --yaml_file=tomofun.yaml --model=Convnext_tiny --model_saved_path=workdirs --model_weights=best.pth

python onnx_predict.py --test_data=test_data

Record audio

pip install pyaudio

Create the record file using the following ipynb file.

record.ipynb

Result

device: cuda:1, rank: 1, world_size: 2
device: cuda:0, rank: 0, world_size: 2
Train_Epoch: 0/99, Training_Loss: 0.011717653522888819 Training_acc: 0.42
Train_Epoch: 0/99, Training_Loss: 0.012225324138998985 Training_acc: 0.40               
Valid_Epoch: 0/99, Valid_Loss: 0.010406222939491273 Valid_acc: 0.49
Valid_Epoch: 0/99, Valid_Loss: 0.01043313001592954 Valid_acc: 0.48
--------------------------------
Train_Epoch: 1/99, Training_Loss: 0.00876050346220533 Training_acc: 0.54               
Train_Epoch: 1/99, Training_Loss: 0.008517718284080426 Training_acc: 0.56               
Valid_Epoch: 1/99, Valid_Loss: 0.008887257364888986 Valid_acc: 0.57               
Valid_Epoch: 1/99, Valid_Loss: 0.008429310657083989 Valid_acc: 0.58               
--------------------------------                          

............

Train_Epoch: 99/99, Training_Loss: 4.295512663895462e-06 Training_acc: 1.00               
Valid_Epoch: 99/99, Valid_Loss: 0.0004894535513647663 Valid_acc: 0.99               
Train_Epoch: 99/99, Training_Loss: 2.0122603179591654e-06 Training_acc: 1.00               
Valid_Epoch: 99/99, Valid_Loss: 0.0006921298647505341 Valid_acc: 0.99             
--------------------------------
Finished Training.

Accuracy

Loss

Confusion Matrix