Create new conda virtual environment
conda create --name audio_classify python=3.7 -y
conda activate audio_classify
Installation
conda install pytorch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0 -c pytorch -y
git clone https://github.com/chingi071/Audio_Classification
pip install -r requirements.txt
Open source audio dataset
Tomofun-AI 狗音辨識: https://github.com/lawrencechen0921/Tomofun-AI-
Kaggle Audio Cats and Dogs: https://www.kaggle.com/mmoreaux/audio-cats-and-dogs
Kaggle Freesound General-Purpose Audio Tagging Challenge: https://www.kaggle.com/c/freesound-audio-tagging/data
Data Preprocessing
If you want to try your dataset, please prepare the following items.
- The training/ validation dataset file
- The data label csv
- The dataset yaml
Take the Kaggle Audio Cats and Dogs dataset as an example, please place the dataset in different folders according to the category.
Next, create the data label csv using the following ipynb file.
- create_data_csv.ipynb
Third, create the dataset yaml.
- cat_dog.yaml
Take the Tomofun-AI dataset as an example, please do data preprocessing. You will get tomofun_train.csv.
- Tomofun_data_preprocessing.ipynb
And then create the dataset yaml.
The Tomofun-AI dataset structure is as follows:
train
├── train_00001.wav
├── train_00002.wav
├── ...
└── train_01200.wav
tomofun_train.csv
Data Augmentation
We use Audiomentations to add more data.
- data_augmentation.ipynb
The dataset structure is as follows:
tomofun_aug_train
├── aug_0_train_00001.wav
├── aug_0_train_00002.wav
├── ...
├── train_00001.wav
├── train_00002.wav
├── ...
└── train_01200.wav
tomofun_aug_train.csv
Data Visualize
- data_visualize.ipynb
The model you can choose: ResNet18、ResNet34、ResNet50、ResNet101、ResNet152、SENet、DenseNet、Convnext_tiny、Convnext_small、Convnext_base、Convnext_large
Train on one GPU
python train.py --yaml_file=tomofun.yaml --model=ResNet18 --model_saved_path=workdirs
Train on multi-GPU
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 train.py --yaml_file=tomofun.yaml --model=Convnext_tiny --model_saved_path=workdirs
To enable one more multi-GPU training, use the following command.
CUDA_VISIBLE_DEVICES=2,3 python -m torch.distributed.launch --nproc_per_node=2 --master_port 9999 train.py --yaml_file=tomofun.yaml --model=Convnext_tiny --model_saved_path=workdirs
Start TensorBoard
tensorboard --logdir runs
python predict.py --yaml_file=tomofun.yaml --model=Convnext_tiny --model_saved_path=workdirs --test_data=test_data
pip install onnx onnxruntime==1.6.0
python convert_to_onnx.py --yaml_file=tomofun.yaml --model=Convnext_tiny --model_saved_path=workdirs --model_weights=best.pth
python onnx_predict.py --test_data=test_data
pip install pyaudio
Create the record file using the following ipynb file.
- record.ipynb
device: cuda:1, rank: 1, world_size: 2
device: cuda:0, rank: 0, world_size: 2
Train_Epoch: 0/99, Training_Loss: 0.011717653522888819 Training_acc: 0.42
Train_Epoch: 0/99, Training_Loss: 0.012225324138998985 Training_acc: 0.40
Valid_Epoch: 0/99, Valid_Loss: 0.010406222939491273 Valid_acc: 0.49
Valid_Epoch: 0/99, Valid_Loss: 0.01043313001592954 Valid_acc: 0.48
--------------------------------
Train_Epoch: 1/99, Training_Loss: 0.00876050346220533 Training_acc: 0.54
Train_Epoch: 1/99, Training_Loss: 0.008517718284080426 Training_acc: 0.56
Valid_Epoch: 1/99, Valid_Loss: 0.008887257364888986 Valid_acc: 0.57
Valid_Epoch: 1/99, Valid_Loss: 0.008429310657083989 Valid_acc: 0.58
--------------------------------
............
Train_Epoch: 99/99, Training_Loss: 4.295512663895462e-06 Training_acc: 1.00
Valid_Epoch: 99/99, Valid_Loss: 0.0004894535513647663 Valid_acc: 0.99
Train_Epoch: 99/99, Training_Loss: 2.0122603179591654e-06 Training_acc: 1.00
Valid_Epoch: 99/99, Valid_Loss: 0.0006921298647505341 Valid_acc: 0.99
--------------------------------
Finished Training.
- Accuracy
- Loss
- Confusion Matrix