AudioDup - Near-duplicate Detection of Audios
This repository presents my trivial approach for near-duplicate detection of audios, by generating acoustic fingerprints.
Setup Instructions
- We assume that you have access to a computer with MacOS. However, you should generally be fine with any Unix/Linux-based systems as well.
- Make sure you have installed Python 3.7 and the latest version of pipenv.
- Install MySQL connector using
brew install mysql-connector-c
.- Fix a potential bug by this.
- Install
brew install portaudio && brew install ffmpeg
. - Install all dependencies with
pipenv install
. - Setup a databset & user for the program:
CREATE DATABASE dejavu;
CREATE USER 'dejavu'@'localhost' IDENTIFIED BY 'dejavu';
GRANT ALL PRIVILEGES ON dejavu.* TO 'dejavu'@'localhost';
To Run the Program
- Collect fingerprints by
pipenv shell python3 collect.py
. - Recognize sound from microphone by
pipenv shell python3 recognize.py
.
Testing
- We would use the FMA Dataset to perform testing. To avoid wasting too much time & disk space, you do not have to download the whole dataset.
- Put what you downloaded into the
data
folder. - Run
pipenv shell python3 collect.py
to collect all fingerprints. - Run
pipenv shell python3 test.py
to collect test results.