[QUESTION] How does tensor_parallel coop with q/k_layernorm
Opened this issue · 1 comments
cryoco commented
Is q/k_layernorm sequence-parallelable? Or how to maintain consistance while utilizing tensor parallel in models with q/k layernorm?
felipeliliti commented
Não sei ainda