Pinned issues
Issues
- 0
MSCCL++ Low-priority Work Items
#199 opened by chhwang - 1
MSCCL++ v0.5.0 Release Plan
#281 opened by chhwang - 0
- 1
- 1
[Feature] Usage as backend in Pytorch
#287 opened by azharlightelligence - 1
[Bug] Program hangs at proxy channel `wait()`
#285 opened by liangyuRain - 3
[Bug]mscclpp-tests dont exit after test.
#282 opened by TonyWu199 - 5
[Bug] Is there a known bug with `Driver Version: 535.129.03` which cases `MscclppAllReduce3` to hang?
#260 opened by saeedmaleki - 1
[Bug]Meet IB problem in single node experiment?
#274 opened by TonyWu199 - 1
[Doc] Inquiry on MSCCL++ Algorithms
#269 opened by jhlee508 - 1
[Bug] getting error for `allreduce_bench.py`
#266 opened by saeedmaleki - 0
[Feature] `CommGroup` method names are confusing
#265 opened by chhwang - 0
- 3
- 4
- 0
[Feature] gets get rid of make pylib-copy
#216 opened by saeedmaleki - 7
[Bug] Error when creating many proxy channels
#242 opened by liangyuRain - 0
[Perf] Relaxed atomic for FIFO push
#226 opened by chhwang - 1
[Doc] In the quickstart.md file, the argument for the mpirun should be '--bind-to numa' instead of '--bind-to-numa'.
#228 opened by sphish - 2
- 0
[Bug] Stronger correctness check in mscclpp-test
#198 opened by chhwang - 0
[Feature] Support python-based mscclpp-test
#187 opened by Binyang2014 - 1
[Feature] fp16 allreduce.
#192 opened by saeedmaleki - 1
MSCCL++ v0.4.0 Release Plan (Released)
#160 opened by chhwang - 1
- 2
- 1
[Perf] Two-node allreduce perf improvement
#210 opened by Binyang2014 - 0
[Feature] a warning for when CQ is about to be full and ask user to flush it.
#194 opened by saeedmaleki - 0
[Feature] Enhance Python benchmark
#215 opened by Binyang2014 - 0
- 0
[Bug] When addMemory for more than 2^8 registered memories to a single proxy service, the system should throw an exception.
#212 opened by saeedmaleki - 0
[Feature] NPKit support
#206 opened by chhwang - 0
[Feature] Topology detection from the topo XML file
#205 opened by chhwang - 0
- 2
- 0
[Feature] `getPacket` arg list does not match the `get` function from `sm_channels`
#158 opened by saeedmaleki - 0
[feature] `ProxyChannel` should not be taking device handles in for constructors.
#155 opened by saeedmaleki - 2
[Bug] Bootstrap occasionally returns "Address in use" error during `initialize()`
#163 opened by chhwang - 1
- 0
[feature] `poll` instead of wait.
#176 opened by saeedmaleki - 0
[Bug] Compilation fails with `CMAKE_BUILD_TYPE=Debug`
#174 opened by chhwang - 2
[Performance] Improve single-node AllReduce latency
#164 opened by chhwang - 0
- 1
- 0
- 1
[Bug] Need to call cudaIpcCloseMemHandle to release remote registered memory
#165 opened by Binyang2014 - 1
- 1
[bug] `fifo` has 128bit atomic reading problems.
#154 opened by saeedmaleki - 0
[feature] python binding for DeviceSyncer
#156 opened by saeedmaleki - 0