Cut radio stations from the network to remove advertising, traffic information and the hourly news. Only the actual program content is retained.
- Generally, one hour of radio content, only about 30 minutes of normal programming, advertising and programming content is not regular
- There are nearly 30 programs with different contents and hosts in a week
- There is no open solution of this type at present
- Cut audio into 4-second blocks and convert them into Mel spectrum (pcen spectrum can also be used to express features)
- Use the spectrum to judge whether it is a normal program, and then cut it
- After many tests, we use the spectrum chart composed of spectrum details and envelope to identify and train. Usually only envelope is used.
- A 401 * 80 square matrix is obtained by using 80 Mel filter banks and 4 seconds of sound. Finally, the 160 * 160 square matrix is used as the training and recognition input
- Using a castrated densenet network (430000 parameters), the deployed caffe2 network is about 2.5m
- Data enhancement method similar to the graph is adopted
- The volume of the sound will be increased or decreased at random
- A random number is added or subtracted as a whole
- The generated Mel spectrum is randomly cut into a 160 * 160 square matrix
- In order to deploy on normal devices, a corresponding simple library is rewritten according to librosa of Python
- Confusion matrix of test set
Abnormal program (actual result) | normal program (actual result) | |
---|---|---|
abnormal (prediction results) | 1607 | 3 |
Normal program (prediction result) | 1 | 2287 |
Accuracy = 99.89%
- In the actual test, the effect is very good. I have specific test results, charts and data here. Call me when you need it
- http://blog.csdn.net/zouxy09/article/details/9156785
- https://blog.csdn.net/zzc15806/article/details/79246716
- https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html
- A part of normal programs are marked manually. The program is divided into normal program wav and Abnormal program wav.
- Then, the wavs is divided into 4 seconds into a wav (2 seconds interval is used for segmentation)
- and then it is converted into Mel spectrum 80 * 480. In order to facilitate training, it is converted to 160 * 240 JPEG
- Using Pytorch as a training framework
- Randomly cut 160 * 160 as training input
- There are Linux version and windows version to delete the advertisement of audio program
- Based on Python v1.3 and v1.0 respectively
- It can divide the programs of radio stations all over the world
- At present, C language version and python version are available
- It is used to cut programs such as Los Angeles Chinese radio station (1300) according to the scheduled schedule time
- Backup shell(bakHandle.sh) and the main contributor to this project is Tang Dong