Error when doing make run-clockwork on new max pooling example
dillonhuff opened this issue · 16 comments
@jeffsetter @thenextged I've gotten the CPU code compiling and running for my max-pooling example (https://github.com/StanfordAHA/Halide-to-Hardware/tree/maxpool_example/apps/hardware_benchmarks/apps/max_pool_2x2), but when I run make run-clockwork
I get the following error:
dhuff@kiwi:~/h2h2clockwork/Halide-to-Hardware/apps/hardware_benchmarks/apps/max_pool_2x2$ make run-clockwork
g++-7 -std=c++17 -I../../../../../clockwork -I../../../../../clockwork/include -I/home/dhuff/h2h2clockwork/Halide-to-Hardware/clockwork/barvinok-0.41/isl -fPIC -I/home/dhuff/h2h2clockwork/clockwork/barvinok-0.41/isl/ -c bin/clockwork_codegen.cpp -o bin/clockwork_codegen.o
In file included from bin/clockwork_codegen.cpp:2:0:
bin/maxpool_memory.cpp: In function ‘prog maxpool()’:
bin/maxpool_memory.cpp:11:27: error: ‘arg_0’ was not declared in this scope
int32_t &hw_output_s0_c = arg_0;
^~~~~
../../hw_support/hardware_targets.mk:138: recipe for target 'bin/clockwork_codegen.o' failed
make: *** [bin/clockwork_codegen.o] Error 1
Any idea what is going on here? Thanks!
Let me take a look (although I think Jeff might be of more help with this error)
Hmm. I'm getting a different error. Are bin/clockwork_testscript.h
and bin/clockwork_testscript.cpp
generated when you run-clockwork
?
I've seen several different errors depending on the order in which I call the make commands.
Yes I see bin/clockwork_testscript.h
and bin/clockwork_testscript.cpp
in bin
after I run make run-clockwork
.
The other error I sometimes get is:
[INFO] Module.compile(): clockwork_source_name = ./bin/maxpool_clockwork.cpp
creating file from name: maxpool
[INFO] Module.compile(): clockwork header file = ./bin/maxpool.h
creating file from name: maxpool
outputting clockwork target named maxpool
Error at ../../../../distrib/tools/GenGen.cpp:4:
Can't represent an integer with this many bits in C: (void *)
../../hw_support/hardware_targets.mk:132: recipe for target 'bin/maxpool_memory.cpp' failed
make: *** [bin/maxpool_memory.cpp] Aborted (core dumped)
make: *** Deleting file 'bin/maxpool_memory.cpp'
What error are you seeing?
Ah! that's the error I'm getting. I'm tracing it now
I just pushed a temporary fix to this maxpool_example
branch. can you pull this and try again?
@thenextged trying now.
@jeffsetter The _hls_target
ProducerConsumer is getting inserted at the wrong place in this app. The outer loop level is outside the _hls_target
PC node (which should never happen). I think the right thing to do is to make sure that the _hls_target
PC node is the first child of the output function's PC node
For now, I went back to using the name hw_output
as the accelerator boundary
@thenextged is there an issue with the way the app itself is written that I could fix? Or is the problem in code generation?
@thenextged just finished running and comparing max pool and they diff
the same. Thank you!
Great! I think the app's Halide code is fine. The codegen fix shouldn't be too bad (I'll let Jeff comment on that)
To adjust the loops, you can use hw_output.reorder(xi, yi, c, xo, yo)
This moves the loops so that the channel loop is within the accelerator (which includes loops between xi
and xo
.
Patch that evaluates correctly is:
index e2238b484..390c303db 100644
--- a/apps/hardware_benchmarks/apps/max_pool_2x2/maxpool_generator.cpp
+++ b/apps/hardware_benchmarks/apps/max_pool_2x2/maxpool_generator.cpp
@@ -77,6 +77,7 @@ public:
hw_output
.tile(x, y, xo, yo, xi, yi, imgSize, imgSize)
+ .reorder(xi, yi, c, xo, yo)
.hw_accelerate(xi, xo);
pooled.compute_at(hw_output, xo);
diff --git a/src/CodeGen_RDAI.cpp b/src/CodeGen_RDAI.cpp
index f102879e5..b7e658b66 100644
--- a/src/CodeGen_RDAI.cpp
+++ b/src/CodeGen_RDAI.cpp
@@ -163,7 +163,8 @@ void CodeGen_RDAI::visit(const Call *op) {
}
void CodeGen_RDAI::visit(const ProducerConsumer *op) {
- string target_prefix = "hw_output";
+ //string target_prefix = "hw_output";
+ string target_prefix = "_hls_target";
if(starts_with(op->name, target_prefix) && op->is_producer) {
Stmt hw_body = substitute_in_all_letstmts(op->body);
@jeffsetter both of these loop nests should be valid 1: (xi, yi, xo, yo, c) and 2: (xi, yi, c, xo, yo))
right? I think if we make sure that all For
nodes in-between hw_output
and _hls_target
PC nodes are moved inside the _hls_target
PC node, then all will be good. Otherwise, we're limiting ourselves in terms of schedules
Yea I think both loop nests should be allowed. If you want to have all of the loops done on the accelerator you can include hw_accelerate(xi, c)
with reorder(xi, yi, xo, yo, c)
.
However, I'm not sure why all of those loops need to be in the accelerator. Is there some assumption in that loops can't be outside the hls_target
Yeah the issue is I'm not supporting scalar variables as accelerator function parameters yet (which we will need in this case to pass the current value of c
if it is outside the accelerator). A better solution if for me to add support of that. But I might not get to it at least until after I get the FPGA flow to work
Or as an alternative, I can emit crops/slices and rewrite the IR to collapse the external dimensions. This way we only keep buffer arguments.
Any ideas about other/better alternatives?
Also, I want to make sure that the strategy I choose here is compatible with your target generator