[QUESTION]why replace F.embedding() with [] on VocabParallelEmbedding class?

Question

[QUESTION]why replace F.embedding() with [] on VocabParallelEmbedding class?

Opened this issue 2 months ago · 1 comments

question
@jon-barker hello, jon, I have some questions on the embedding, can you help explain? Why replace F.embedding(masked_input, self.weight) with self.weight[masked_input] in forward() function of class VocabParallelEmbedding? What is the difference between them? Why does the F.embedding() can bring 'non-determinism'?

link：https://github.com/NVIDIA/Megatron-LM/blob/core_r0.5.0/megatron/core/tensor_parallel/layers.py#L218

Answer 1 · 2024-06-08T18:20:27.000Z

Marking as stale. No activity in 60 days.