Why scale used in Attention is 8 （while dim_head is 64)? If dim or dim_head are changed, should scale be changed automatically?

Question

lqniunjunlper opened this issue a year ago · 1 comments

Answer 1 · 2023-11-01T08:11:00.000Z

in attend.py line #123
sim = einsum("b h i d, b h j d -> b h i j", q, k) * self.scale