This project develops a machine learning model to classify species in the BirdClef 2023 dataset. It leverages the Whisper encoder for robust feature extraction and applies convolutional neural network (CNN) variations to accurately determine bird species from audio samples.
Before running the project, ensure that you have all the necessary libraries installed. You can install the dependencies using the following command:
pip install -r requirements.txt
Download the dataset used in this project from the BirdClef 2023 competition on Kaggle.
The dataset is organized within the birdclef-2023
directory, structured as follows:
birdclef-2023
├── augmented_audio # Augmented audio files, can be made using BirdClef_Augmentation.ipynb
├── augmented_pt # Augmented PyTorch tensor files, can be made using needToMakePT = True in BirdClef_Classification.ipynb
├── checkpoints # Model checkpoints, automatically created to save checkpoints during each epoch
├── train_audio # Original training audio files, taken from BirdClef 2023 dataset
├── train_pt # PyTorch tensor files from training audio, can be made using needToMakePT = True in BirdClef_Classification.ipynb
├── aug_metadata.csv # Metadata for augmented audio files, can be made using BirdClef_Augmentation.ipynb
└── train_metadata.csv # Metadata for training audio files, taken from BirdClef 2023 dataset
BirdClef_Augmentation.ipynb
BirdClef_Classification.ipynb
PrintCheckpoint.ipynb
requirements.txt
...
...
Make sure the data is arranged as shown above to properly run the model training and evaluation scripts.
Training and evaluation configurations can be adjusted through the following flags:
augmentedRun
:True
for using augmented data,False
for raw data.FTRun
:True
for feature tuning enabled,False
for it disabled.
These flags determine the saving path of model checkpoints within the checkpoints
directory.
Checkpoints are automatically saved in designated subdirectories within the checkpoints
folder, corresponding to the configuration of your training session based on the dataset and feature tuning flags.
The demo pretrained model can be accessed here: https://drive.google.com/file/d/1BdgFq3qonHxFxJRxbqSrSMbgfsvj5umb/view?usp=drive_link
We welcome contributions to improve the model and its implementation. If you have suggestions or improvements, please open an issue to discuss your ideas before submitting a pull request.
For any queries regarding this project, please open an issue in the GitHub repository, and we will get back to you.