does self attention improve ebm?

Question

does self attention improve ebm?

comzzw opened this issue 3 years ago · 2 comments

I've found that the repo has already implemented self-attention. Have the authors tried using self-attention during training ebm and does self-attention improve ebm? Looking forward to your reply. Thanks in advance.

Answer 1 · 2021-09-23T12:01:15.000Z

Utilizing self attention in this codebase unfortunately destabilized training. In our more recent paper: https://github.com/yilundu/improved_contrastive_divergence, we find that by penalizing the KL divergence, self-attention can improve final generation performance.

Answer 2 · 2021-09-24T12:24:23.000Z

Thanks! Congratulations on your wonderful work!