Pinned issues
Issues
- 1
- 9
[Bug] C++ exception with description "ibv_modify_qp failed (errno 19) (Ib failure: No such device)" thrown in the test body.
#431 opened by FC-Li - 1
[Bug] Proxy channel over CudaIPC on AMD GPUs
#418 opened by liangyuRain - 4
- 4
- 1
[Bug] There is not any InfiniBand or NVLink in my 4-GPU machine, how can I use mscclpp to communicate?
#397 opened by Maphsge4 - 1
- 1
[Bug] Proxy chan hang at cudaMemcpyAsync
#394 opened by FC-Li - 1
- 1
[Bug] run nccl_api_test failed
#387 opened by yizhang2077 - 0
[Bug] flush() hang bug.
#377 opened by TonyWu199 - 5
[Perf] Failed to reproduce the performance result for Single-node AllReduce mentioned in README.md
#362 opened by FC-Li - 0
[Feature] Immediate data upon signal & wait
#368 opened by chhwang - 1
[Feature] Can NVIDIA and AMD communicate?
#361 opened by liuyang6055 - 1
[Bug] Can't launch allreduce test
#359 opened by chenhongyu2048 - 2
- 4
Is there exist some documentation to explain the difference between allreduce algorithm in mscclpp?
#350 opened by MARD1NO - 9
- 2
- 4
[Bug] (Ib failure: Cannot allocate memory) reported when run mscclpp-test/allreduce_test_perf with MPI on 2 nodes
#323 opened by dong-liuliu - 1
[Bug]mscclpp::DeviceSyncer::sync Assertion failed
#320 opened by linstreamer - 0
[Feature] NPKit support
#206 opened by chhwang - 3
[Bug] Bugs in mp_unit_test.
#315 opened by TonyWu199 - 4
How to use mscclpp as a backend in pytorch
#311 opened by wangfakang - 1
[Bug] Program hangs at proxy channel `wait()`
#285 opened by liangyuRain - 3
[Bug]mscclpp-tests dont exit after test.
#282 opened by TonyWu199 - 1
[Bug] __assert_fail declaration in mscclpp breaks "assert()" usage in host functions.
#302 opened by Alkaid-Benetnash - 0
MSCCL++ Low-priority Work Items
#199 opened by chhwang - 1
MSCCL++ v0.5.0 Release Plan
#281 opened by chhwang - 0
- 1
- 1
[Feature] Usage as backend in Pytorch
#287 opened by azharlightelligence - 5
[Bug] Is there a known bug with `Driver Version: 535.129.03` which cases `MscclppAllReduce3` to hang?
#260 opened by saeedmaleki - 1
[Bug]Meet IB problem in single node experiment?
#274 opened by TonyWu199 - 1
[Doc] Inquiry on MSCCL++ Algorithms
#269 opened by jhlee508 - 1
[Bug] getting error for `allreduce_bench.py`
#266 opened by saeedmaleki - 0
[Feature] `CommGroup` method names are confusing
#265 opened by chhwang - 0
- 4
- 0
[Feature] gets get rid of make pylib-copy
#216 opened by saeedmaleki - 7
[Bug] Error when creating many proxy channels
#242 opened by liangyuRain - 0
[Perf] Relaxed atomic for FIFO push
#226 opened by chhwang - 1
[Doc] In the quickstart.md file, the argument for the mpirun should be '--bind-to numa' instead of '--bind-to-numa'.
#228 opened by sphish - 2
- 1
- 2
- 1
[Perf] Two-node allreduce perf improvement
#210 opened by Binyang2014 - 0
[Feature] Enhance Python benchmark
#215 opened by Binyang2014 - 0
[Bug] When addMemory for more than 2^8 registered memories to a single proxy service, the system should throw an exception.
#212 opened by saeedmaleki - 0
[Feature] Topology detection from the topo XML file
#205 opened by chhwang