
Utilities for clustering of audio samples

Primary LanguageJulia



This package contains experiments and utilities for unsupervised learning on acoustic recordings. This package is a use case of SpectralDistances.jl


using Pkg
pkg"add https://github.com/baggepinnen/DetectionIoTools.jl"
pkg"add https://github.com/baggepinnen/AudioClustering.jl"


Estimating linear models

The following code illustrates how to use SpectralDistances.jl to fit rational spectra to audio samples and extract the poles for use as features

using SpectralDistances, Glob
path      = "/home/fredrikb/birds/" # path to a bunch of wav files
files     = glob("*.wav")
const fs  = 44100
na        = 20
fitmethod = LS(na=na)

models    = mapsoundfiles(files) do sound
     sound = SpectralDistances.bp_filter(sound, (50/fs, 18000/fs))
     SpectralDistances.fitmodel(fitmethod, sound)

We now have a vector of vectors with linear models fit to the sound files. To make this easier to work with, we flatten this structure to a single long vector and extract the poles (roots) of the linear systems to use as features

X = embeddings(models)

We now have some audio data, represented as poles of rational spectra, in a matrix X. See https://baggepinnen.github.io/SpectralDistances.jl/latest/examples/#Examples-1 for examples of how to use this matrix for analysis of the signals, e.g., classification, clustering and detection.

Graph-based clustering

Model based

A graph representation of X can be obtained with

G = audiograph(X, 5; λ=0)

where k=5 is the number of nearest neighbors considered when building the graph. If λ=0 the graph will be weighted by distance, whereas if λ>0 the graph will be weigted according to adjacency under the kernel exp(-λ*d). The metric used is the Euclidean distance. If you want to use a more sophisticated distance, try, e.g.,

dist = OptimalTransportRootDistance(domain=Continuous(), p=2)
G = audiograph(X, 5, dist; λ=0)

Here, the Euclidean distance will be used to select neighbors, but the edges will be weighted using the provided distance. This avoids having to calculate a very large number of pairwise distances using the more expensive distance metric.

Any graph-based algorithm may now operate on G, or on the field G.weight. Further examples are available here.

Spectrogram based

The following snippets show how to preprocess data to a suitable form for clustering using this package:

using GLob
files = glob("*.wav") # Vector of file paths
const fs = Int(wavread(files[1])[2]) # Rread the sampling time
N = length(files)

using TotalLeastSquares # For lowrankfilter
function lrfilt(y)
    yf = lowrankfilter(y, min(250, length(y)-1100), lag=10)

"Perform some simple threshold filtering and calculate a spectrogram"
function spec(sound)
    @. sound = Float32(100000 * clamp(sound, -0.015f0, 0.015f0)) # the 100000 multiplier is to normalize the Float32 data for better numerical performance. Tune all parameters to you use case.
    # sound = lrfilt(sound) # This is an alternative to the above which is *much* better, but also much slower
    melspectrogram(sound, 100, 70, nmels=30, fs=fs, fmin=5)      # Spend some time making sure spectrogram representation is good.

using ThreadTools # For tmap
spectrograms = tmap(files) do file
    sound = spec(vec(wavread(file)[1]))

matrices = [Float32.(max.(normalize_spectrogram(s), 1e-7)) for s in spectrograms]
# matrices_masked = mask_filter.(matrices) # This is an alternative if the lowrankfilter is not used https://baggepinnen.github.io/SpectralDistances.jl/latest/distances/#SpectralDistances.mask_filter

inds, D = initialize_clusters(dist, matrices; init_multiplier = 10, N_seeds = 100)
patterns = matrices[inds] # These should be good cluster seeds

Distance matrix-based clustering

See docs entry Clustering using a distance matrix

Feature-based clustering

See docs entry Clustering using features

Accelerated k-nearest neighbor

inds, dists, D = knn_accelerated(dist, X, k, Xe=X; kwargs...)

Find the nearest neighbor from using distance metric dist by first finding the k nearest neighbors using Euclidean distance on embeddings produced from Xe, and then using dist do find the smallest distance within those k.

X is assumed to be a vector of something dist can operate on, such as a vector of models from SpectralDistances. Xe is by default the same as X, or possibly something else, as long as embeddings(Xe) is defined. A vector of models or spectrograms has this function defined.

D is a sparse matrix with all the computed distances from dist. This matrix contains raw distance measurements, to symmetrize, call SpectralDistances.symmetrize!(D). The returned dists are already symmetrized.