Cannot apply parallel primitive in HeteroCL module
Opened this issue · 5 comments
The issue occurs in the digit recognition example with the .parallel()
primitive. I was trying to use a kernel function to update the knn_mat
instead of calling hcl.compute
, and perform scheduling on the itervars
inside the kernel function (i.e. hcl module). The program after modification looks like:
def knn(*placeholders):
@hcl.def_([(10,1800), (10,3)])
def update_knn(dist, knn_mat):
with hcl.for_(0,10, name="i") as i:
with hcl.for_(0,1800, name="j") as j:
max_id = hcl.scalar(0, "max_id")
with hcl.for_(0, 3, name="k") as k:
with hcl.if_(knn_mat[i][k] > knn_mat[i][max_id.v]):
max_id.v = k
with hcl.if_(dist[i][j] < knn_mat[i][max_id.v]):
knn_mat[i][max_id.v] = dist[i][j]
update_knn(dist, knn_mat)
And the scheduling is performed as the following snippet:
knn_update = knn.update_knn
s[knn_update].reorder(knn_update.axis[0], knn_update.axis[1])
# ISSUE: this primitive will lead to segmentation fault
# s[knn_update].parallel(knn_update.axis[1])
s[knn_update].pipeline(knn_update.axis[0])
All other scheduling primitives work well, but when I call the .parallel()
. The program will error out with a segmentation fault.
Do we translate the parallel() primitive to a corresponding pragma in HLS?
Currently the parallel
primitive is only for CPU, which triggers multi-threaded execution.
As I mentioned before, we need to support it for hardware synthesis. Shall we open another issue? If not, this will fall through the cracks again.
It's ignored in HLS code generator. I am considering to let the CodeGenC to translate .parallel()
to OpenMP pragmas.
And for HLS codegen, we may use .parallel()
to perform kernel replication to exploit data-lvele parallelism?
@hecmay yes, we can at least use it for the OpenCL flow. I believe the Merlin compiler supports parallel execution as well.