Xilinx/XRT

memcpy taking much longer time when copying from xrt:bo to host

Opened this issue · 0 comments

I am trying to copy the contents of my HLS kernel back to host, but this is taking much longer than expected. I have done the profiling and for xrt:bo to host it takes around 15ms while it takes 2.2 ms to copy memory from host to host. Below is the code snippet

    void* mem_a;
    void* mem_b;

    posix_memalign(&mem_a,4096, image_out_size_bytes);
    posix_memalign(&mem_b,4096, image_out_size_bytes);


    auto a_bo = xrt::bo(device, image_out_size_bytes, stereo_accel.group_id(0));
    auto b_bo = xrt::bo(device, image_out_size_bytes, stereo_accel.group_id(1));

    auto a_bo_map = a_bo.map();
    auto b_bo_map = b_bo.map();

    b_bo.sync(XCL_BO_SYNC_BO_FROM_DEVICE);
    memcpy(mem_a, b_bo_map, image_out_size_bytes);     // -----> Takes around 15 ms
    
    memcpy(a_bo_map, b_bo_map, image_out_size_bytes);  //----> Takes around 15 ms
    a_bo.sync(XCL_BO_SYNC_BO_TO_DEVICE);
    
    memcpy(mem_a, mem_b, image_out_size_bytes);        //-------> Takes around 2.2 ms

    clock_gettime(CLOCK_REALTIME, &end_hw);

Please let me know why is this behavior and how can I optimize it?