pytorch/examples

reference of weight initialization for llama2 model

SeunghyunSEO opened this issue · 1 comments

first of all, thank you for supporting native TP for torch.
i just have been reading your TP tutorial code and found the initialization detail is different from the pytorch default parameterization (kaming init).
is there any reference for depth init ??

i think this issue is appropriate for torchtitan.