Issues
- 0
Tensor 与 tensor 可能误用的排查工作
#231 opened by doombeaker - 1
bert run_squad.py训练问题
#217 opened by lpj0711 - 1
gpt 0.4.0 的完整支持和验证
#210 opened by strint - 1
A100集群测试2机16卡吞吐低于单机8卡吞吐
#206 opened by iamweizhi - 0
- 0
Would like to see more CTR benchmarks.
#172 opened by EtoDemerzel0427 - 4
Bert Model Parallel
#168 opened by sywang0111 - 3
- 2
- 1
- 1
run sh train.sh hangs in resnet50 benchmark with 4 and 8 gpus of single machine
#152 opened by wuyujiji - 5
Unable to complete model training
#141 opened by fengyuchao97 - 9
CNN benchmark cannot run
#130 opened by JF-D - 3
- 0
- 8
- 5
cnns two nodes train came error: `Check failed: error == CUDNN_STATUS_SUCCESS (9 vs. 0) CUDNN_STATUS_NOT_SUPPORTED` but one nodes can work well
#103 opened by qianzhang613 - 1
README中链接忘记更新了
#89 opened by wyg1997