drscotthawley/audio-classifier-keras-cnn

Train_network.py can't handle smaller mel spectrogram shapes

Closed this issue · 4 comments

Hi Dr. Hawley,

I noticed a small problem in the code in both train_network and eval_network- there is no error handling for files that produce spectrograms smaller than width 1293. This happens leads when the training data is created from the mel spectrograms (X_train[train_count,:,:] = melgram, around line 140).
You have written code to chop off the extra width if it is too long ( melgram = melgram[:,:,:,0:mel_dims[3]] ) but nothing to account for melgrams being too short.
I was able to get around it by filling the empty space with 0's, but I thought it would be helpful to let you know!

Also- if you are interested, I would love to connect with you sometime to talk about potential ways to extend this example/model to a system that works in real time, and makes predictions on songs as it hears them through a computer microphone versus an uploaded mp3.
My email is aaronopp@gmail.com if you want to connect!

Thanks,

Aaron

sir ,

i got this error what should i do ?

Negative dimension size caused by subtracting 3 from 1 for 'conv2d_12/convolution' (op: 'Conv2D') with input shapes: [?,1,96,431], [3,3,431,32].

@MrNakum You have to add data_format="channels_first" to your model.add(Convolution2D calls like so:

model.add(Convolution2D(nb_filters, kernel_size[0], strides=2,
                        border_mode='valid', input_shape=input_shape,data_format="channels_first"))

In def shuffle_XY_paths you make a shallow copy of the path names leading to incorrect results. Use deepcopy instead like so:

def shuffle_XY_paths(X,Y,paths):  
    newpaths = copy.deepcopy(paths)

Resolved in https://github.com/drscotthawley/panotti

Check there instead.