maxhodak/keras-molecules

KL divergence term in loss function

Opened this issue · 0 comments

The loss function for the autoencoder is calculated here using
kl_loss = - 0.5 * K.mean(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis = -1)
taking the mean over the dimensions of the latent representation. However, several other sources, including the VAE example in the keras repo, use the sum instead:
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
Is there a reason for the difference? Given the relatively large number of latent dimensions, it seems like this would significantly impact the strength of the KL regularization.