question about the kDispatchPacketHeader used in OpenCL
qishilu opened this issue · 0 comments
hi , all
I have some quesion about the hsa package header used in OCL.
openCL define below in Runtime/device/Rocvirtual.cpp.
static const uint16_t kDispatchPacketHeaderNoSync =
(HSA_PACKET_TYPE_KERNEL_DISPATCH << HSA_PACKET_HEADER_TYPE) |
(HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_ACQUIRE_FENCE_SCOPE) |
(HSA_FENCE_SCOPE_NONE << HSA_PACKET_HEADER_RELEASE_FENCE_SCOPE);
static const uint16_t kDispatchPacketHeader =
(HSA_PACKET_TYPE_KERNEL_DISPATCH << HSA_PACKET_HEADER_TYPE) | (1 << HSA_PACKET_HEADER_BARRIER) |
(HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_ACQUIRE_FENCE_SCOPE) |
(HSA_FENCE_SCOPE_NONE << HSA_PACKET_HEADER_RELEASE_FENCE_SCOPE);`
OpenCL use kDispatchPacketHeaderNoSync by default,
and it will change to kDispatchPacketHeader if current kernel has memory dependecy with some kernels pre euqueued.
the code is in VirtualGPU::MemoryDependency::validate
for (size_t j = 0; j < endMemObjectsInQueue_; ++j) {
// Check if the queue already contains this mem object and
// GPU operations aren't readonly
uint64_t busyStart = memObjectsInQueue_[j].start_;
uint64_t busyEnd = memObjectsInQueue_[j].end_;
// Check if the start inside the busy region
if ((((curStart >= busyStart) && (curStart < busyEnd)) ||
// Check if the end inside the busy region
((curEnd > busyStart) && (curEnd <= busyEnd)) ||
// Check if the start/end cover the busy region
((curStart <= busyStart) && (curEnd >= busyEnd))) &&
// If the buys region was written or the current one is for write
(!memObjectsInQueue_[j].readOnly_ || !readOnly)) {
flushL1Cache = true;
break;
}
}
// Did we reach the limit?
if (maxMemObjectsInQueue_ <= numMemObjectsInQueue_) {
flushL1Cache = true;
}
if (flushL1Cache) {
// Sync AQL packets
gpu.setAqlHeader(kDispatchPacketHeader);
// Clear memory dependency state
const static bool All = true;
clear(!All);
}
from the code , I think flushL1Cache is caused by set header with 1 << HSA_PACKET_HEADER_BARRIER.
And in this link https://rocmdocs.amd.com/en/latest/Tutorial/Optimizing-Dispatches.html#optimizing-dispatches, it tell the HSA_FENCE_SCOPE_AGENT << HSA_PACKET_HEADER_RELEASE_FENCE_SCOPE will flush L1 cache .
so I think the two defines of hsa headers worked the same (it will flush the L1 cache )
(HSA_PACKET_TYPE_KERNEL_DISPATCH << HSA_PACKET_HEADER_TYPE) | (1 << HSA_PACKET_HEADER_BARRIER) |
(HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_ACQUIRE_FENCE_SCOPE) |
(HSA_FENCE_SCOPE_NONE << HSA_PACKET_HEADER_RELEASE_FENCE_SCOPE);
or
(HSA_PACKET_TYPE_KERNEL_DISPATCH << HSA_PACKET_HEADER_TYPE) | (1 << HSA_PACKET_HEADER_BARRIER) |
(HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_ACQUIRE_FENCE_SCOPE) |
(HSA_FENCE_SCOPE_AGENT << HSA_PACKET_HEADER_RELEASE_FENCE_SCOPE);
Is this conclusion right ?