memcpy taking much longer time when copying from xrt:bo to host
Opened this issue · 0 comments
jasvinderkhurana commented
I am trying to copy the contents of my HLS kernel back to host, but this is taking much longer than expected. I have done the profiling and for xrt:bo to host it takes around 15ms while it takes 2.2 ms to copy memory from host to host. Below is the code snippet
void* mem_a;
void* mem_b;
posix_memalign(&mem_a,4096, image_out_size_bytes);
posix_memalign(&mem_b,4096, image_out_size_bytes);
auto a_bo = xrt::bo(device, image_out_size_bytes, stereo_accel.group_id(0));
auto b_bo = xrt::bo(device, image_out_size_bytes, stereo_accel.group_id(1));
auto a_bo_map = a_bo.map();
auto b_bo_map = b_bo.map();
b_bo.sync(XCL_BO_SYNC_BO_FROM_DEVICE);
memcpy(mem_a, b_bo_map, image_out_size_bytes); // -----> Takes around 15 ms
memcpy(a_bo_map, b_bo_map, image_out_size_bytes); //----> Takes around 15 ms
a_bo.sync(XCL_BO_SYNC_BO_TO_DEVICE);
memcpy(mem_a, mem_b, image_out_size_bytes); //-------> Takes around 2.2 ms
clock_gettime(CLOCK_REALTIME, &end_hw);
Please let me know why is this behavior and how can I optimize it?