Issues
- 8
Test NCCL failure common with network error.
#252 opened by ismailguzel - 5
question about pingpong example
#253 opened by jinz2014 - 0
- 1
Enable P2P on pcie in a nvlink machine
#250 opened by cll24 - 2
Running in kubernetes pods Error
#248 opened by drikster80 - 2
Getting Avg bus bandwidth = 0 when running all_reduce_perf in nccl-tests in my EC2 G5.8x large
#249 opened by rajeshvenkata - 13
H100 all reduce performance is poor
#212 opened by liminn - 23
all_reduce_perf core dumped on 4 L20
#233 opened by songh11 - 2
NCCL all-reduce test failure due to TL_SHM ERROR, This case was happened on containers on same server.
#247 opened by thsmfe001 - 4
2 Node Nccl Test don’t work for A100
#242 opened by jeffreyyjp - 1
NCCL_Algo=Tree
#246 opened by afattaholman - 5
- 1
- 21
- 2
Test NCCL Hang
#244 opened by sdonoso - 0
- 7
2 Node Nccl Test don’t work
#236 opened by SdEnd - 0
- 0
- 1
What's multi-allreduce ?
#234 opened by ProHuper - 0
NCCL Tree allreduce test cannot reach the theoretical bus bandwidth on 2 nodes with 4 nics
#232 opened by ProHuper - 9
Test NCCL failure common.cu:997 'internal error
#231 opened by sdonoso - 4
how to support One Device per Process?
#221 opened by jiangxiaobin96 - 4
- 5
what is cu:990 error? how to solve this problem?
#230 opened by MAKER-park - 1
2 Nodes nccl-test with mpi hangs
#229 opened by sdonoso - 3
has nvswitch, but uses 0 nvls channels
#228 opened by MiyazonoKaori - 2
Test fail caused by ibvwrap.c:160 NCCL WARN Call to ibv_modify_qp failed with error Connection timed out.
#227 opened by thsmfe001 - 14
- 5
- 1
- 0
mpirun all_reduce_perf hang with multi-device test
#223 opened by 913871734 - 5
Performance lack of NCCL Test
#201 opened by shengode503 - 0
1 GiB headroom might be too small
#220 opened by Namnamseo - 9
Test NCCL failure common.cu:959 'internal error - please report this issue to the NCCL developers / '
#219 opened by Assassin187 - 8
- 4
- 1
NCCL_ALGO on multi-node and multi-GPU
#215 opened by MajidSalimi - 2
SendRecv Time
#214 opened by osayamenja - 6
Nccl test seems run seperately on multi nodes
#213 opened by jianh619 - 1
undefined reference nccl*
#211 opened by gongyguo - 2
misc/ibvwrap.cc:278 NCCL WARN Call to ibv_reg_mr_iova2 failed with error Cannot allocate memory
#206 opened by jxh314 - 0
Differences problems in performance data of HGX A800 single server N GPUs nccl testing
#210 opened by cloveryyg - 1
- 0
make failed, error -- unsupported GNU version! gcc versions later than 11 are not supported!
#207 opened by jxh314 - 0
- 7
Test NCCL failure common.cu:961 'internal error - please report this issue to the NCCL developers / '
#204 opened by a-c-dream - 1
Why getBw don't have access to agg_iters ?
#202 opened by x41lakazam - 2
Multi node test hang phenomenon
#200 opened by gim4moon - 0