/mnemophonix

A simple audio fingerprinting system

Primary LanguageCMIT LicenseMIT

mnemophonix

A simple audio fingerprinting system

This project was inspired by the article https://www.codeproject.com/Articles/206507/Duplicates-detector-via-audio-fingerprinting by Sergiu Ciumac that explains how to build a Shazam-like system that can index audio files and then, given an audio file, try to identify it against the database previously built.

The work done here is a simplified version in C of this work. It is built from scratch without any dependency, since the main goal was to learn in details how audio fingerprinting works. It has lots of comments and it is only moderately optimized with multithreading to make the fingerprinting not too slow, but it tries to stay easy to understand. There is no attempt at storing the signatures in an optimized way, so if you want to use it at large scale, you will probably need to customize the I/O.

The canonical format that the program can process is 44100Hz 16-bit PCM. However, for any input file that does not look like this, the program will attempt a conversion on-the-fly with the best Swiss army knife media tool: ffmpeg, so any file with audio (including videos) can be fingerprinted.

For the record, fingerprinting the 130 songs from DEFCON 20 to DEFCON 27 generates a 75Mb database. Fingerprinting a 2 hour movie produces a 16Mb signature and takes about 55 seconds on a MacBook Pro (including extracting the audio from the movie with ffmpeg). Once the database is loaded in memory, searching for an audio sample of a few seconds is almost instantaneous.

How to build it (Linux & MacOS)

Run make.

How to use it

In order to index a file and store the generated signature into a file, run this:

$ mnemophonix index test.wav > test.signature

The signatures are plain text, so you can create your database by just concatenating signatures into one big file like this:

$ mnemophonix index song1.wav > db
$ mnemophonix index song2.mp3 >> db
$ mnemophonix index movie.mp4 >> db

Then, in order to identify a sample against the database, run this:

$ mnemophonix search sample.wav db
Reading 44100Hz samples...
Resampling to 5512Hz...
Normalizing samples...
31444 5512Hz mono samples
Got 42 spectral images
Applying Haar transform to spectral images
Building raw fingerprints
Generated 42 signatures
Loading database db...
(raw database loading took 2122 ms)
(lsh index building took 964 ms)
Searching...
(Search took 32 ms)

Found match: 'defcon 26/01 - Skittish & Bus - OTP.mp3'
Artist: Skittish & Bus
Track title: OTP
Album title: DEF CON 26: The Official Soundtrack

Cool, but it would be even cooler to guess straight from the microphone...

...which is why there is a companion program for MacOS, written in Objective C. You can build it with xcodebuild and then run it with your database, which will print on the standard output any identified audio entry:

$ ./build/Release/ears db
 ---                                                       ---
 \  \  mnemophonix - A simple audio fingerprinting system  \  \
 O  O                                                      O  O
Loading 'db'...
Database loaded...
Starting to listen...

Found match: 'defcon 20/13 - High Sage feat. Katy Rokit - Stu.mp3'
Artist: Highsage feat. Katy Rokit
Track title: High Sage feat. Katy Rokit - Stuck on Ceazar's Challenge (KEW QEIMYUK QEIMYUK QEIM AYM)
Album title: DEF CON XX Commemorative Compilation


Found match: 'defcon 24/16 Mindex - Jazzy Mood (Crystal Mix).mp3'
Artist: Mindex
Track title: Jazzy Mood (Crystal Mix)
Album title: DEF CON 24: The Original Soundtrack

Credits

This work used the following sources: