Deep Chroma questions

Question

Deep Chroma questions

Closed this issue 7 years ago · 1 comments

Hi Filip,

thanks for sharing your code! It helps in understanding the details of ISMIR 2016 paper.

I have one question regarding the early stopping criteria:

You say in your paper that you use validation accuracy as the stopping criteria–however, in your experiment file, you say early_stop_acc: false https://github.com/fdlm/chordrec/blob/master/experiments/ismir2016/deep_chroma.yaml#L13
Later you use early_stop_acc: true https://github.com/fdlm/chordrec/blob/master/experiments/ismir2016/deep_chroma.yaml#L42

Regarding your accuracy function onehot_acc(pred, targ) I do not understand why you only compare the "arg-max" values https://github.com/fdlm/nn/blob/master/nn/nn.py#L219
Is this some "practical choice" or do I misunderstand this metric? IMHO, since it is a multi-class problem, the metric should reflect it, shouldn't it?

In general, what is your experience with using the loss as the early stopping criteria? Any drawbacks from doing that?

Thanks in advance for your response!

Answer 1 · 2017-01-02T13:28:25.000Z

Hi Stefan,

thanks for pointing this out -- you actually found a minor inaccuracy in the paper itself. Sorry about that!

When training the deep chroma network, I do not use validation accuracy for early stopping, but the loss. When training the logistic regression (a.k.a. the softmax layer in the neural network) for chord classification, I do use validation accuracy. This is why you have the two settings.

I chose to do so because from my experience it seemed that in the final training phase the network could reduce the loss without improving the accuracy (e.g. maybe because easy examples are pushed farther away from the classification border, etc.). This should not have big effect, though.

Regarding the accuracy function -- it is used when training the chord classification layer (when training the deep chroma network, I think elemwise_acc is used for printing the accuracy to the console). Yes, this is a multi-class problem -- but in any case, accuracy is always just n_correct / n_total, and a frame is classified correctly if the arg-max of the one-hot encoded output vector matches the arg-max of the corresponding target vector. For example, assume we have three classes, a target vector for a single instance would look like [0, 1, 0], and a correct output vector would be e.g. [0.3, 0.5, 0.2].

I hope that makes sense.