NOBLES5E opened this issue 3 years ago · 0 comments
where each embedding vector shares a single float grad running avg