This is the finalized audio subproject for our final capstone project Cog*Dex. It classifies sea mammal sounds into three categories (dolphin, whale, and seal) from the Watkins Marine Mammal 'Best of Cuts' Sound Database provided by the Woods Hole Oceanographic Institution.
- Authors (in alphabetical order): Brian Chen, Brayden Levangie, Amy Liu, Jordi Malaret, and Eloise Zeng
- TAs (in alphabetical order): Kathryn Le and Adrianna Peng
- Instructors: Ryan Soklaski and Victoria Helus
All the raw audio data can be webscraped from the sound database or accessed here: https://drive.google.com/open?id=1QjxAsbSYzWQaQgLN2TAVrLTMdtLq-vGD
data_init.ipynb converts all the raw audio data into spectrograms (represents five seconds of an audio clip) which can be accessed here: https://drive.google.com/open?id=1mL-Mx7ggvkSUKskby7nQX-VoQy6iB12M
train_data_init.ipynb sorts all the spectrograms into training and validation data which can be accessed here: https://drive.google.com/open?id=19gD99LZzIrhPkJhnbRGt-nCQ7au8qdnI
Training data: 109 spectrograms (saved as .png images) for dolphins, whales, and seals each
Validation data: 26 spectrograms (saved as .png images) for dolphins, whales, and seals each
train_model.ipynb The pretrained Resnet-18 CNN was feature extracted.
Training accuracy: 83%
Validation accuracy: 87%
Audio models can be accessed from the .pt files in here: https://drive.google.com/open?id=1B6mWm1Hcx0v74AvnzBa4y0iC3scNlUsE
Google Colab was used to run all the code