NVIDIA/Megatron-LM

[QUESTION] How does tensor_parallel coop with q/k_layernorm

Opened this issue · 1 comments

Is q/k_layernorm sequence-parallelable? Or how to maintain consistance while utilizing tensor parallel in models with q/k layernorm?

Não sei ainda