/Audio-Fingerprinting

Audio fingerprinting and recognition in JAVA

Primary LanguageJavaMIT LicenseMIT

Music recognition system


A music fingerprinting system that uses JAVA and requires a MySQL database (although it is not required, but the system uses it to save fingerprints and music information). He includes fingerprint generation, database storage, and easy server and client.

By generating and recording music fingerprints, he is able to recognize music from various sources such as microphones, files, etc., and has high noise immunity, while he is not sensitive to file attributes and music quality. You can use the server to provide music recognition services to your phone or other programs.

You can adjust the parameters inside according to your needs. The current parameters are to identify the source of noise and distortion in a short time. About 1500 files will produce nearly 24 million fingerprint data. If you only use it to identify files and there is no serious noise and distortion, you can modify the parameters. One file can be identified with only a small amount of fingerprints. For the less noise source, 200 fingerprints have already met most of the requirements.

Easy to use

Dependent libraries: com.springsource.org.json, JTransforms, mysql-connector-java

  1. You need to install MySQL and execute Fingerprint.sql. You may need to modify the max_allowed_packet parameter, because adding a song requires sending a larger package. The parameter I am using is 32M.

  2. Modify the database information in MysqlDB for your database information.

  3. How to add a file:

    1. Transcode the file to WAV with a sampling rate of 8000.
    2. Call Insert, the parameter is the file name or folder.

    *Ps: You can override the added method or make a script or use other software to implement the transcoding function. Currently he can get the information from the file name of %title%}}%album%}}%artist%. *

  • Search music
    • You can call Search+ filename search.
    • In the case of a large database, it is recommended to run Server and search using Client+ filename.

Introduction of main parameters

NPeaks: the number of peak points per subband in a cycle
fftSize: the window size of the FFT
Overlap: the overlap size of the FFT window
C: How many windows are included in one cycle?
peakRange: compares with the range of neighbors when taking the peak point
Range_time: Time range when the point is taken, in seconds
Range_freq: The frequency range at which the point pair is taken, in units of frequency
Band: divided subband, the value corresponds to the array index generated by the FFT
minFreq: minimum frequency
maxFreq: maximum frequency
minPower: minimum energy

Suggested changes:

  • Increase recognition rate:

  • Reduce minPower, increase Band, NPeaks, range_time

  • Reduce the amount of data:

  • Increase minPower, reduce Band, NPeaks, rang_time

It is recommended to modify Band and minPower first.


Server:

Port: the port of the server

Client:

Ip: server ip Port: the port of the server

Performance and effects

**Data volume: **The music library is 1500 songs, the number of fingerprints is about 24 million, and the server takes up about 340M after being stable.

**Speed: **Processor i7-3632QM, adding 1500 songs takes about 1919 seconds, and a song takes about 1.3 seconds. It takes about 0.2 seconds to find a 10s song using the server (regardless of the time the client reads the file).

**Accuracy: ** has a high recognition rate for low-noise audio, and close to commercial accuracy for higher noise, but relatively speaking, if there is a song that does not appear in the music library, there is a certain error. Report rate.

**Anti-noise: ** can resist strong distortion and noise, you can refer to the test audio I gave.

Working principle

Reference documentation:

The algorithm is similar to Shazam. First, I calculate the spectrum of the audio. The spectrum is divided into several sub-bands according to the frequency, and several peak points are searched for each sub-band. The subband of the algorithm is based on the Mel frequency.

The peak points to be obtained are grouped according to the frequency and time range.

The point-to-frequency range of the algorithm is within the sub-band, the purpose of which is to reduce the number of pairs of points and improve the distributed capability. The time range for taking a pair is 1s-4s. You can modify these parameters as needed.

##Contact information EMAIL: hsyecheng@hotmail.com