Why do you think the model has not converge at 160K?

Question

Why do you think the model has not converge at 160K?

Closed this issue 5 years ago · 1 comments

Do you have some basis?

Answer 1 · 2019-04-15T08:02:42.000Z

I'm not sure, but I think the model have not learned enough attention.
From a lot of experiments, the diagonal attention is the most important measure that separates success and failure for generating samples.