openucx/ucx

Difference between UCX zero-copy protocol and rendezvous protocol

JiakunYan opened this issue · 10 comments

I am working on a paper involving UCX experiment results. I want to understand the difference between the zero-copy protocol and the rendezvous protocol so I can have a clear view what the following environment variables actually change
UCX_ZCOPY_THRESH
UCX_RNDV_THRESH

I would really appreciate the explanation!

Another side question: is there a quick way to query the actual value of those thresholds if I set them to auto?

check ucx_info tool to get the default value on the running system.
besides RNDV protocol, eager protocol also supports ZCOPY

Thank you! Could you let me know what arguments I should pass to ucx_info? I tried ucx_info -c but it just told me the variable was set to auto. Is there a way to know which value this auto actually selects?

I dug into the source code for a little bit. ZCOPY (for ibverbs) just means to use iovec to transfer the messages, is it correct?

ucx_info --help.

  1. ucx_info --help did not really tell me much regarding this question. I tried some flags but failed to find the information I wanted. Could you point me to the right argument?

  2. I would really appreciate it if you could elaborate what you think or point me to the right places of the source code!

  1. Pls run the application with UCX_PROTO_INFO=y to get the information about protocols used at runtime for the specific application
  2. Yes, zcopy uses iovec the the SW layers (UCT and ibverbs)

zcopy uses iovec, does the iov(here) mean the WQE with multiple SGEs?

zcopy uses iovec, does the iov(here) mean the WQE with multiple SGEs?

If needed, yes.

I agree with this. I mistook it as UCP datatype IOV here for the original question.

Thank you for your help!