NVIDIA/Megatron-LM

[QUESTION]why replace F.embedding() with [] on VocabParallelEmbedding class?

Opened this issue · 1 comments

question
@jon-barker hello, jon, I have some questions on the embedding, can you help explain? Why replace F.embedding(masked_input, self.weight) with self.weight[masked_input] in forward() function of class VocabParallelEmbedding? What is the difference between them? Why does the F.embedding() can bring 'non-determinism'?

link:https://github.com/NVIDIA/Megatron-LM/blob/core_r0.5.0/megatron/core/tensor_parallel/layers.py#L218

Marking as stale. No activity in 60 days.