GPUOpen-Archive/CodeXL

Need help in gpu memory part analysis with CodeXL

Closed this issue · 3 comments

inkedinked 3_li
Hello.
This is the output when I run CodeXL with my micro benchmark.
I have question in the section with my mouse pointer location where some more details are shown in "Data transfer" line.
It starts with "Name: 309.5 MB MAP_BUFFER" and below it has some number in "Duration and clEnqueue API Duration".
Can anyone tell me difference between Duration and clEnqueue API Duration?

I want to measure the HBM performance inside this amd gpu and I saw that Transfer Rate is 6.235 GB/s at bottom. However, I read that HBM has 128~256 GB/s for bandwidth and I want to know why there are so big difference in this micro benchmark. My test benchmark was just simple matrix calculation with large number of configuration.
ANY idea or comments would help. Thanks

I don't know the answer to the difference between the measured transfer rate and the peak theoretical numbers but I can tell you about the difference between the Duration and clEnqueue API Duration.

Basically the Duration is the time taken for the whole mapping operation to complete: map the buffer and make it available for the host to write. clEnqueue API duration is measuing the time it takes to enqueue the command that will map the buffer. Queue are fundamentally asynchronous to adding a new command could potentially take some time. Hopes this makes sense.

The Duration is basically taken from the timestamps in the cl event associated with the clEnqueueMapBuffer call. It corresponds to the Start and End times from the event (see https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/clGetEventProfilingInfo.html). This is the amount of time take to map the buffer on the device, including going over the PCIE bus.

The clEnqueue API Duration is the host side duration of the call to clEnqueueMapBuffer (basically how long the API call took to execute on the host).

The Transfer Rate shown by CodeXL is a simple calculation of Transfer Size divided by Duration.

My understanding is that the theoretical Bandwidth figure you quote for HBM is not relevant for host/device data transfers (which is what CodeXL is reporting). Instead, it is the peak theoretical maximum rate at which an executing kernel can read/write to video memory.

Thanks for kind explanation @mcleary @chesik-amd
I have other question about profiling HBM in GPU especially AMD's since I see some tag in you @chesik-amd
I looked into various AMD gpu profiling tools such as Radeon GPU profiling and CodeXL.
If your saying is correct (that those data transferring is only related to host/device only and does not contain any gpu's internal memory transfer), is there ANY ways or tools that I could use to investigate on those subjects? Cause I could not find any results nor analysis on various tools that shows HBM bandwidth or memory usage percentage or any of those kinds. How could I able to find out some GPU is spending "x%" of their BW on "y" application?
I'm currently implementing HBM on simulator and want to compare this with real machine and want some basis for my accuracy measurements.

PS. Thanks again for pointing out things that I didn't even notice. I think your understanding is correct from inside my heart but I wrote "if" just in case...