Bird Whisperer: Leveraging Large Pre-trained Acoustic Model for Bird Call Classification (InterSpeech'24)
Bird Whisperer: Leveraging Large Pre-trained Acoustic Model for Bird Call Classification
Muhammad Umer Sheikh, Hassan Abid, Bhuiyan Sanjid Shafique, Asif Hanif, and Muhammad Haris
Abstract
Adapting large pre-trained acoustic models across diverse domains poses a significant challenge in speech processing, particularly when shifting from human to non-human contexts. This study aims to bridge this gap by utilizing the pre-trained Whisper model, initially intended for human speech recognition, for classifying bird calls. Our study reveals that when employed solely as a feature extractor, the Whisper encoder fails to yield meaningful features from bird calls, possibly due to categorizing them as background noise. We propose a simple but effective technique to enhance Whisperβs ability to extract distinctive features from avian vocalizations, resulting in a remarkable 15% increase in F1-score over the baseline. Furthermore, we mitigate the issue of class imbalance within the dataset by introducing a series of data augmentations. Our findings underscore the potential of adapting large pre-trained acoustic models to tackle broader bioacoustic classification tasks.
- June 04, 2024 : Accepted in INTER SPEECH 2024 π π
- September 01, 2024 : Released code for Bird Whisperer
- In Progress : Preparing the dataset processing instructions
- Create a conda environment
conda create --name bird-whisperer python=3.8
conda activate bird-whisperer
- Install PyTorch and other dependencies
git clone https://github.com/umer-sheikh/bird-whisperer
cd bird-whisperer
pip install -r requirements.txt
We have performed experiments on the bird call classification dataset: BirdCLEF 2023.
We provide instructions for downloading and processing the dataset used by our method in the DATASET.md.
All files after downloading and preprocessing should be placed in a directory named birdclef2023-dataset
and the path of this directory should be specified in the variable DATASET_ROOT
in the shell scripts. The directory structure should be as follows:
birdclef2023-dataset/
βββ audio_files/
|ββ original/
|ββ augmented/
βββ pt_files/
|ββ original/
|ββ augmented/
|ββ original.csv
|ββ augmented.csv
We have performed all experiments on NVIDIA RTX 4090
GPU. Shell scripts to run experiments can be found in scripts folder. Below are the shell commands to run experiments (fine-tuning
, linear probing
, or random initialization
). Before running the commands, please ensure that you set the paths for the dataset root directory and the directory where you want to save the model weights.
## Fine Tuning
bash scripts/bird_whisperer_finetune.sh
## Linear Probing
bash scripts/bird_whisperer_linearprobing.sh
## Random Initialization
bash scripts/bird_whisperer_randominit.sh
Results are saved in json
format in logs directory.
To run experiments on the original dataset, simply remove the --augmented_run
argument from the Shell scripts.
If you want to use the EfficientNet-B4 model as the feature extractor instead of Whisper, change the MODEL_NAME
variable in the shell scripts from 'whisper'
to 'efficientnet_b4'
.
If you find our work or this repository useful, please consider giving a star β and citation.
@inproceedings{birdwhisperer2024,
title = {Bird Whisperer: Leveraging Large Pre-trained Acoustic Model for Bird Call Classification},
author = {Muhammad Umer Sheikh and Hassan Abid and Bhuiyan Sanjid Shafique and Asif Hanif and Muhammad Haris},
booktitle = {Proceedings of the INTERSPEECH 2024 Conference},
year = {2024},
}
Should you have any questions, please create an issue on this repository or contact us at hassan.abid@mbzuai.ac.ae
We used the Whisper codebase for the feature extraction in our proposed method Bird Whisperer. We thank the authors for releasing the codebase.