cooper12121/llama3-8x8b-MoE
Copy the MLP of llama3 8 times as 8 experts , created a router with random initialization,add load balancing loss to construct an 8x8b MoE model based on llama3.
Python
Stargazers
- ArtificialZengBeijing
- cooper12121WuHan University
- CrazyBoyMBaidu
- cygwynd
- edentliangwaterloo
- fan196
- flydsc
- hxhcreateFudan Univ
- JingYao152
- JohnClaw
- kingydw
- KunhaoGuo
- leixy76
- makotovMorgan Stanley Science Investment Co.,LTD
- mcpoet
- menhguin
- minlikiTutorGroup
- mst272Southwest Jiaotong University
- nayohanLG Uplus CTO
- Newkic
- qianxinchun
- ThomasLWang
- zdhs123
- zhangzaibinDLUT