Why not using RDMA WRITE( with imm) or RDMA READ to avoid additional memory copy?
zpcalan opened this issue · 9 comments
Hi developers of dmlc.
Recently I've been working with ps-lite. I find out that in the operation of KVWorker::Pull
, there maybe some memory copy operations that may cause performance overhead:
template <typename Val>
void KVServer<Val>::Response(const KVMeta& req, const KVPairs<Val>& res) {
Message msg;
msg.meta.app_id = obj_->app_id();
msg.meta.customer_id = req.customer_id;
msg.meta.request = false;
msg.meta.push = req.push;
msg.meta.pull = req.pull;
msg.meta.head = req.cmd;
msg.meta.timestamp = req.timestamp;
msg.meta.recver = req.sender;
if (res.keys.size()) {
msg.AddData(res.keys);
msg.AddData(res.vals);
if (res.lens.size()) {
msg.AddData(res.lens);
}
}
Postoffice::Get()->van()->Send(msg);
}
In my opinion, code msg.AddData
holds back the communication between woker and server.
If using RDMA WRITE( with imm) or RDMA READ, we can exchange the address by out-of-band communication in the beginning, then do PULL and PUSH on these addresses.
This is just my thought. Please correct me if I'm wrong or I miss some details.
By the way, the function Van::PackMetaPB
used by IBVERBS misses pb->set_pull(meta.pull);
which is in the function of Van::PackMeta
used by ZMQ. This will cause the error when running the test case ps-lite/tests/test_kv_app.cc
using IBVERBS.
@zpcalan msg.AddData
does not involve extra copy. It only operates on SArray, which is a shared pointer that points to the actual message address. In ibverbs_van.h
, we implement RDMA write so the sending is zero copy, like what you said.
@ymjiang Thanks for your reply! I will go deep into the code later. For now I have two questions:
1.Do you mean that the time cost on framework would be very little in both pull and push?(Just to confirm)
2.Do you think it's neccessary to add pb->set_pull(meta.pull);
in function Van::PackMetaPB
to avoid error when running your test case?
1.Do you mean that the time cost on framework would be very little in both pull and push?(Just to confirm)
The overhead of push/pull is mainly in network transmission (not in ps-lite processing). Networking overhead could be large, depending on your workload. If you use RDMA rather than TCP, the overhead will be lower. You can refer to the comparison between ZeroMQ and the IBverbs implementation here: #124 (comment). In this test, we use a communication-intensive workload to show the effectiveness of RDMA.
2.Do you think it's neccessary to add
pb->set_pull(meta.pull);
in functionVan::PackMetaPB
to avoid error when running your test case?
Could you paste the error here? Does it fix the error if you add pb->set_pull(meta.pull);
?
Thanks for your tips for question 1!
As for question 2, I just write a simple demo according to your test case test_kv_app.cc
. I also did slight modification to launch script local.sh
to run on two nodes. The scheduler and server run on Node1.The worker run on Node2. (Both of them use IBVERBS)
test.cc:
#include <cmath>
#include "ps/ps.h"
#include <unistd.h>
using namespace ps;
void StartServer() {
if (!IsServer()) {
return;
}
auto server = new KVServer<float>(0);
server->set_request_handle(KVServerDefaultHandle<float>());
RegisterExitCallback([server](){ delete server; });
}
void RunWorker() {
if (!IsWorker()) return;
KVWorker<float> kv(0, 0);
std::vector<uint64_t> key = {1, 2, 3};
std::vector<float> val = {1, 1, 1};
std::vector<float> recv_val;
kv.Wait(kv.Push(key, val));
kv.Wait(kv.Pull(key, &recv_val));
}
int main(int argc, char *argv[]) {
// start system
Start(0);
// setup server nodes
StartServer();
// run worker nodes
RunWorker();
// stop system
Finalize(0, true);
return 0;
}
local.sh
!/bin/bash
# set -x
if [ $# -lt 3 ]; then
echo "usage: $0 num_servers num_workers bin [args..]"
exit -1;
fi
export DMLC_NUM_SERVER=$1
shift
export DMLC_NUM_WORKER=$1
shift
bin=$1
shift
arg="$@"
export DMLC_PS_ROOT_URI='ib'
export DMLC_PS_ROOT_PORT=8000
if [ "$HOSTNAME" == "node1" ];then
# start the scheduler
export DMLC_ROLE='scheduler'
${bin} ${arg} &
# start servers
export DMLC_ROLE='server'
for ((i=0; i<${DMLC_NUM_SERVER}; ++i)); do
#export HEAPPROFILE=./S${i}
${bin} ${arg} &
done
fi
if [ "$HOSTNAME" == "node2" ];then
# start workers
export DMLC_ROLE='worker'
for ((i=0; i<${DMLC_NUM_WORKER}; ++i)); do
#export HEAPPROFILE=./W${i}
${bin} ${arg} &
done
fi
wait
How to run: export PS_VERBOSE=1;export DMLC_PS_VAN_TYPE=ibverbs;export DMLC_INTERFACE=ib1;./local.sh 1 1 ./build/ps_demo
And I get the error in server side:
ps-lite/include/ps/kv_app.h:447: Check failed: (n) == (req_data.vals.size())
Stack trace returned 10 entries:
[bt] (0) ./build/ps_demo() [0x4086bb]
[bt] (1) ./build/ps_demo() [0x40895d]
[bt] (2) ./build/ps_demo() [0x410ca3]
[bt] (3) ./build/ps_demo() [0x40cd93]
[bt] (4) ./build/ps_demo() [0x410790]
[bt] (5) ./build/ps_demo() [0x40c925]
[bt] (6) ./build/ps_demo() [0x41d703]
[bt] (7) ./build/ps_demo() [0x41af67]
[bt] (8) ./build/ps_demo() [0x417ca5]
[bt] (9) ./build/ps_demo() [0x4145ab]
.........
After I add pb->set_pull(meta.pull)
, the program runs without error.
@zpcalan Thank you for the reports. Looks like it is a bug. But the problem does not seem to be in IBverbs. If you disable IBverbs, does it also happen or not?
No, there's no problem when I use ZMQ.