Why do you think the model has not converge at 160K?
Closed this issue · 1 comments
sunnnnnnnny commented
Do you have some basis?
soobinseo commented
I'm not sure, but I think the model have not learned enough attention.
From a lot of experiments, the diagonal attention is the most important measure that separates success and failure for generating samples.