openai/ebm_code_release

does self attention improve ebm?

comzzw opened this issue · 2 comments

I've found that the repo has already implemented self-attention. Have the authors tried using self-attention during training ebm and does self-attention improve ebm? Looking forward to your reply. Thanks in advance.

Utilizing self attention in this codebase unfortunately destabilized training. In our more recent paper: https://github.com/yilundu/improved_contrastive_divergence, we find that by penalizing the KL divergence, self-attention can improve final generation performance.

Thanks! Congratulations on your wonderful work!