py_speech_seg

A toolkit to implement segmentation on speech based on BIC

Dependency

You can use the installation of Anaconda to satisfy the required packages except Librosa.

To install librosa, you can try the following command:

conda install -c conda-forge librosa

Run script multi_detect.py to test the segmentation on a simple wav file:

python multi_detect.py

And you can get a speech segmentation result as showm below:

In the python script of multi_detect.py, there is a function call after some parameter settings:

seg_point = seg.multi_segmentation("dialog4.wav",sr,frame_size,frame_shift,plot_seg=False,save_seg=True)

To save the segmented audio into wav files, set the flag save_seg=True

To plot out the wave figure in time domain with segmentation lines on, set the flag plot_seg=True
Add a new parameter interface to enable the "Clustering segmented audio fragment using Kmeans method", just set the flag: classify_seg=True

To determine the number of cluster number, I plot out a figure with X axis the number of clusters, Y axis is the "Sum of squared distances of samples to their closest cluster center" for each Kmeans clustering. Choose the best K value under Elbow Criterion:

From the figure shown abvove, I choose K = 2 to be the best cluster numbers:

Please input the best K value: 2

The lables for 4 speech segmentation belongs to the clusters below:

0 1 0 1

From the audio files stored in folder "save_audio", we can check that the clustering result is right.

Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion, by IBM T.J. Watson Research Center