does self attention improve ebm?
comzzw opened this issue · 2 comments
comzzw commented
I've found that the repo has already implemented self-attention. Have the authors tried using self-attention during training ebm and does self-attention improve ebm? Looking forward to your reply. Thanks in advance.
yilundu commented
Utilizing self attention in this codebase unfortunately destabilized training. In our more recent paper: https://github.com/yilundu/improved_contrastive_divergence, we find that by penalizing the KL divergence, self-attention can improve final generation performance.
zzw-zjgsu commented
Thanks! Congratulations on your wonderful work!