reference of weight initialization for llama2 model
SeunghyunSEO opened this issue · 1 comments
SeunghyunSEO commented
first of all, thank you for supporting native TP for torch.
i just have been reading your TP tutorial code and found the initialization detail is different from the pytorch default parameterization (kaming init).
is there any reference for depth init ??
SeunghyunSEO commented
i think this issue is appropriate for torchtitan.