cuda 11.7 and cuda 11.8 gives different results for decoder self-attention?
frankxyy opened this issue · 0 comments
frankxyy commented
I found that for different cuda toolkit versions of 11.7 and 11.8. The results of decoder self-attention is different. Cuda 11.8 gives expected result. Why does this happen?