keunwoochoi/music-auto_tagging-keras

Low AUC for MTT

Rashmeet09 opened this issue · 6 comments

I trained this CNN model on MTT dataset and I got the results given in the files attached.
The acc is 90.69% but AUC is quite low being 0.58.
Can you view once and suggest any change for improvement?

Code.txt
output_cnn.txt

Yea, I finetuned this model on MTT with different set of labels.. even I got less auc scores. for most of the tags, it was in the range 0.65 - 0.75.

First, the difference of accuracy and AUC is normal. There are so many zeros in true Y, so only by predicting them all zero the accuracy would be above 80% or 90% on MTT.

@Rashmeet09 The code seems alright, sorry but I have no idea. Especially when you're not using the pre-trained weights, I can't think of any reason. Side note, I'd use val_loss rather than val_acc for metric. Did you randomise the training data? how are they processed?

@as641651 I assume you used 'msd' weights. How did you prepared mel-spectrogram? The weights of MusicTaggerCNN is trained with power of power-melspectrogram, i.e. melgram**4, just because of my mistake, see here). It may have affected your experiment.

I just use the audio processor from your repo:
logam(melgram(y=src, sr=12000, hop_length=256, n_fft=512, n_mels=96)**2, ref_power=1.0)

So you used melgram(y=src, sr=12000, hop_length=256, n_fft=512, n_mels=96)**2)**2? no log amplitute?

Oh, then it's correct, never mind. Hm...

I processed and split the MTT (0-11 folders for train.h5; 12,13 for valid.h5 and 14,15 for test.h5) as in the file attached (referred urbansound dataset pre-processing and audio_processor from your repo) :-
split.txt

How did you pre-process MTT? Should I use last.fm dataset from MSD to reproduce better results on this model?

I was filtering out labels with prob < 0.2 while testing. I removed this and re-evaluated n I got weighted average auc 0.80 (fine-tuned on new set of labels for 40k iter) That seems reasonable right? furthermore, I had removed max-pooling in final layer of CNN