why the word embedding shape need to be multiply by 2?

Question

why the word embedding shape need to be multiply by 2?

xiongjun19 opened this issue 5 months ago · 2 comments

I'm a little confused about the shape of word embedding in the Megatron Model?
why the num of embedding need to multiply by two :
self.word_embedding = MockedParam( (2 * num_embedding_per_partition, hidden_size), name=self.name )

Answer 1 · 2024-11-04T07:03:57.000Z

The dimension of word_embedding is num_embedding_per_partition * hidden_size * params_dtype , where the value 2 indicates that the data type corresponds to 2 bytes.
However, after carefully reading the code, I found that the data type of word_embedding is float32, which should correspond to 4 bytes. I will modify this to 4. Future versions will further support custom dtype.

Answer 2 · 2024-11-04T09:05:45.000Z

The dimension of word_embedding is num_embedding_per_partition * hidden_size * params_dtype , where the value 2 indicates that the data type corresponds to 2 bytes. However, after carefully reading the code, I found that the data type of word_embedding is float32, which should correspond to 4 bytes. I will modify this to 4. Future versions will further support custom dtype.

great, I understand now, thanks for your great job