guotong1988/BERT-GPU
multi-gpu pre-training in one machine for BERT from scratch without horovod (Data Parallelism)
PythonApache-2.0
Stargazers
- c00h00gShanghai
- caoxu915683474@Lenovo Reasearch@BIT
- cbqinsuzhou
- cfwin
- chopwoodwater
- dnanhkhoa@aistairc
- Eurus-HolmesCreatify AI
- Fancy7777Baidu
- fly51flyPRIS
- fuxia0425
- HaishuoFangTU Darmstadt
- indykishATTUNE INC
- jueliangguke
- kiminh
- kitlomer
- leileiganChina
- liu-nlperSoochow University
- mathlf2015T
- mingkinXiamen University
- mingmingshiwo
- njoe9
- nlper27149北京
- numb3r3@jina-ai
- qianpeishengI2R, A*STAR; SUTD
- qinqiang1990
- rich-junwang
- SeekPoint
- sportzhang
- thanhtoan1196
- tianflame
- xhyandwyyAlibaba DAMO Academy
- xuanhan863Los Angeles, USA
- yangapkuPeking Univ.
- YC-windNanchang in China
- yotofu
- yuchenlin@allenai