Low AUC for MTT

Question

Low AUC for MTT

Rashmeet09 opened this issue 8 years ago · 6 comments

I trained this CNN model on MTT dataset and I got the results given in the files attached.
The acc is 90.69% but AUC is quite low being 0.58.
Can you view once and suggest any change for improvement?

Code.txt
output_cnn.txt

Answer 1 · 2017-03-04T11:20:02.000Z

Yea, I finetuned this model on MTT with different set of labels.. even I got less auc scores. for most of the tags, it was in the range 0.65 - 0.75.

Answer 2 · 2017-03-04T12:39:57.000Z

First, the difference of accuracy and AUC is normal. There are so many zeros in true Y, so only by predicting them all zero the accuracy would be above 80% or 90% on MTT.

@Rashmeet09 The code seems alright, sorry but I have no idea. Especially when you're not using the pre-trained weights, I can't think of any reason. Side note, I'd use val_loss rather than val_acc for metric. Did you randomise the training data? how are they processed?

@as641651 I assume you used 'msd' weights. How did you prepared mel-spectrogram? The weights of MusicTaggerCNN is trained with power of power-melspectrogram, i.e. melgram**4, just because of my mistake, see here). It may have affected your experiment.

Answer 3 · 2017-03-04T13:16:49.000Z

I just use the audio processor from your repo:
logam(melgram(y=src, sr=12000, hop_length=256, n_fft=512, n_mels=96)**2, ref_power=1.0)

So you used melgram(y=src, sr=12000, hop_length=256, n_fft=512, n_mels=96)**2)**2? no log amplitute?

Answer 4 · 2017-03-04T13:45:35.000Z

Oh, then it's correct, never mind. Hm...

Answer 5 · 2017-03-05T11:50:50.000Z

I processed and split the MTT (0-11 folders for train.h5; 12,13 for valid.h5 and 14,15 for test.h5) as in the file attached (referred urbansound dataset pre-processing and audio_processor from your repo) :-
split.txt

How did you pre-process MTT? Should I use last.fm dataset from MSD to reproduce better results on this model?

Answer 6 · 2017-03-05T14:27:52.000Z

I was filtering out labels with prob < 0.2 while testing. I removed this and re-evaluated n I got weighted average auc 0.80 (fine-tuned on new set of labels for 40k iter) That seems reasonable right? furthermore, I had removed max-pooling in final layer of CNN