reduce_scatter on MacOS issue
ChengjieLi28 opened this issue · 0 comments
ChengjieLi28 commented
Hi team,
I successfully compiled gloo on MacOS
by setting USE_LIBUV ON
,
but when I test the reduce_scatter
OP, I found that core dump at runtime.
I use pybind11 to bind python interface, here's the code:
def worker_reduce_scatter(rank):
from .. import xoscar_pygloo as xp
if rank == 0:
if os.path.exists(fileStore_path):
shutil.rmtree(fileStore_path)
os.makedirs(fileStore_path)
else:
time.sleep(0.5)
context = xp.rendezvous.Context(rank, 3)
if system_name == "Linux":
attr = xp.transport.tcp.attr("localhost")
dev = xp.transport.tcp.CreateDevice(attr)
else:
attr = xp.transport.uv.attr("localhost")
dev = xp.transport.uv.CreateDevice(attr)
fileStore = xp.rendezvous.FileStore(fileStore_path)
store = xp.rendezvous.PrefixStore(str(3), fileStore)
context.connectFullMesh(store, dev)
sendbuf = np.array(
[i + 1 for i in range(sum([j + 1 for j in range(3)]))], dtype=np.float32
)
print(f'Send buf: {sendbuf}')
sendptr = sendbuf.ctypes.data
recvbuf = np.zeros(2, dtype=np.float32)
recvptr = recvbuf.ctypes.data
recvElems = [2, 2, 2]
data_size = (
sendbuf.size if isinstance(sendbuf, np.ndarray) else sendbuf.numpy().size
)
print(f'Data size: {data_size}')
datatype = xp.glooDataType_t.glooFloat32
op = xp.ReduceOp.SUM
xp.reduce_scatter(context, sendptr, recvptr, data_size, recvElems, datatype, op)
print(f"rank {rank} sends {sendbuf}, receives {recvbuf}")
def test_reduce_scatter():
process1 = mp.Process(target=worker_reduce_scatter, args=(0,))
process1.start()
process2 = mp.Process(target=worker_reduce_scatter, args=(1,))
process2.start()
process3 = mp.Process(target=worker_reduce_scatter, args=(2,))
process3.start()
process1.join()
process2.join()
process3.join()
This test not work on MacOS, but works on Linux.
May I ask that why this happens? Thank you very much.