aliyun/aicb

why the word embedding shape need to be multiply by 2?

Closed this issue · 2 comments

I'm a little confused about the shape of word embedding in the Megatron Model?
why the num of embedding need to multiply by two :
self.word_embedding = MockedParam( (2 * num_embedding_per_partition, hidden_size), name=self.name )

The dimension of word_embedding is num_embedding_per_partition * hidden_size * params_dtype , where the value 2 indicates that the data type corresponds to 2 bytes.
However, after carefully reading the code, I found that the data type of word_embedding is float32, which should correspond to 4 bytes. I will modify this to 4. Future versions will further support custom dtype.

The dimension of word_embedding is num_embedding_per_partition * hidden_size * params_dtype , where the value 2 indicates that the data type corresponds to 2 bytes. However, after carefully reading the code, I found that the data type of word_embedding is float32, which should correspond to 4 bytes. I will modify this to 4. Future versions will further support custom dtype.

great, I understand now, thanks for your great job