Pinned issues
Issues
- 3
how does use PCIe peer-to-peer or NVLink between two containers that each have an isolated GPU
#10070 opened by linxiaochou - 3
- 13
When using shared memory communication, ucp_am_send_nbx hangs and callback not invoked
#10370 opened by ivanallen - 0
When building from source code of v5.0.3, it says ‘The submodule "config/oac" is missing.’
#10399 opened by zpcalan - 5
test_ucp.c was stuck/hang at ucs_init.
#10389 opened by MoFHeka - 2
ucp_worker.c:728 Assertion `wiface->activate_count > 0' failed: iface 0x43f3800 (tcp/lo)
#10376 opened by ivanallen - 9
Configure/compile: cflags/cxxflags and compiler invocation
#10367 opened by tonycurtis - 1
Failed to run ucx assessment job and getting job limit exceed 12 while triggering ucx assessment job [version ucx 0.38 version onwards,facing this issue-unable to trigger ucx assessment job]
#10248 opened by pradeep5561 - 9
- 2
- 2
- 1
- 3
OpenMPI+UCX with multiple GPUs error: "named symbol not found"
#10304 opened by pascal-boeschoten-hapteon - 2
How to create a group communication with UCX?
#10361 opened by MoFHeka - 0
UCT/CUDA_IPC: Possible UB when enabling CUDA_IPC_CACHE
#10346 opened by andylin-hao - 4
- 2
Why ucp_put_nbx/ucp_get-nbx does not support sgl buffer
#10314 opened by super-train - 2
Traffic class not fully applied via UCX_IB_TRAFFIC_CLASS in one direction
#10325 opened by cserranobr - 0
Shared memory transport provides lower throughput for large message sizes than inter-node transport via the Infiniband
#10317 opened by satishskamath - 4
Invalid active_speed on Mellanox NDR 400Gb/s
#10298 opened by vitduck - 8
- 0
How to initialize UCP when there are multiple GPUs in one machine and multiple GPU machines in cluster?
#10276 opened by MoFHeka - 0
CUDA-Aware UCX with a mixture of CPU-only & GPU Nodes
#10273 opened by judicaelclair - 1
Question: Connection error on Azure ML Cluster
#10252 opened by hovnatan - 3
- 0
Q: UCX support for Gracehopper + Slingshot 11
#10234 opened by angainor - 2
Can a chain be built between ucx 1.12 and ucx1.14
#10222 opened by super-train - 1
[UCX][1.18] compatibility issue with ASAN
#10170 opened by musaleh17 - 0
UCX error for D-H and H-D with current UCX master branch
#10181 opened by edgargabriel - 13
ERROR mlx5_0: both WC and NC_DEDICATED UAR allocation types are not supported
#10180 opened by tonycurtis - 0
Incorrect result report for UCP tag_bw test
#10184 opened by SeyedMir - 15
cuda, rc Bandwidth fluctuates regularly
#10164 opened by yangrudan - 3
How preallocate buffer through rendezvous protocol before ucp_tag_recv_nbx actually receiving?
#10148 opened by MoFHeka - 0
- 2
- 1
Package version conflict
#10126 opened by AtticusBeachy - 14
- 5
When testing ROCm D2D transfers with UCX_TLS=rc, how does setting UCX_IB_GPU_DIRECT_RDMA=0 affect the osu_bw test results?
#10077 opened by shuiYizero - 9
NCCL all-reduce test failure due to TL_SHM ERROR, This case was happened on containers on same server.
#10055 opened by thsmfe001 - 3
Assertion `worker->inprogress++ == 0' failed
#10039 opened by pereverges - 1
- 6
Segmentation fault with ROCm on certain setup
#10037 opened by fxzjshm - 8
- 2
AM for Multiple threads
#10031 opened by J-StrawHat - 3
How to change single copy VIA xpmem execution to the sender process
#10019 opened by arun-chandran-edarath - 1
Stuck at waitForEvents
#10015 opened by pereverges - 1
Will 32-bit architecture be supported in the future?
#10008 opened by Wire-less-LAN - 1
Unexpected modprobe processes on RHEL9 CPU-only nodes using OpenMPI 5 with UCX built with CUDA
#9997 opened by ZQyou - 2
Long-Tail Requests
#9976 opened by Clownier - 1
Fedora package for 1.17.0 does not build on ppc
#9980 opened by bkmgit