Loss Function: categorical_crossentropy and binary_crossentropy

Question

Loss Function: categorical_crossentropy and binary_crossentropy

TheWindRider opened this issue 7 years ago · 1 comments

I was in the 05/09/2018 class before TrainAI conference, and one peer student reported better accuracy when replacing categorical_crossentropy with binary_crossentropy, and I experienced that improvement too on 2 architectures (perceptron, mlp) and possibly more.

I'd like to ask/discuss here, what the mathematics look like when apply binary cross entropy loss function to multiple-label classification? I'm speculating that the way it's enforced (although binary is supposed to work with 2 labels) happen to benefit accuracy in this problem.

Toy example and my guess:
label = [0, 0, 1, 0, 0], predict = [0.1, 0.1, 0.6, 0.1, 0.1]
categorical_crossentropy(label, predict) = -log(0.6)
binary_crossentropy(label, predict) = -log(0.6)-4*log(0.9)

Answer 1 · 2020-09-04T03:57:42.000Z

Interesting observation! If you're still interested in this question, I recommend you ask it on our Slack forum for ML engineers and enthusiasts: bit.ly/slack-forum.