Jensen–Shannon divergence
LamyaMohaned opened this issue · 2 comments
Hello,
I'm trying to understand Jensen–Shannon divergence, I still don't understand the math behind it, but someone asked me to investigate about it and Augmix because of this paragraph:
Alternatively, we can view each set as an empirical distribution and measure the distance between
them using Kullback-Leibler (KL) or Jensen-Shannon (JS) divergence. The challenge for learning
with KL or JS divergence is that no useful gradient is provided when the two empirical distributions
have disjoint supports or have a non-empty intersection contained in a set of measure zero.
from here: https://arxiv.org/pdf/1907.10764.pdf
Is this problem presented in Augmix?
This is not a problem with AugMix since they share the same support and for all elements of the support, the probabilities are greater than zero.
Thank you!