Residual structure cannot be converted to hls
TATynise opened this issue · 1 comments
Hi,I encountered "cycle-free graph violated: partition depends on itself" while running a custom network on finn.I have tried adjusting the streamlining and convert_to_hls steps according to ResNet-50 finn-example, but it still failed.
This is the residual part of the network:
Refer to "cnv_end2end_example",after streamline the residual part is as shown in the figure:
Refer to the "streamline nonlinear" step in ResNet50 finn-example, as shown in the figure:
Then converted to hls, as shown in the figure:
When finally using "parent_model = model.transform(CreateDataflowPartition())", it failed because the residual part was not converted successfully.I have tried many ways but nothing works, I hope you can provide some guidance.
Thanks.
Hi @TATynise,
Thanks for your question!
Residual networks are indeed a bit tricky since it requires a streamlining process that's relatively more involved compared to linear networks. It looks like the streamlining process didn't 'fully streamline' the graph -- meaning you have a few floating point operators left in your network. In the final image you showed, you can see that the Mul
and Add
nodes (which are regular ONNX node) are mixed with the so-called fpgadataflow
nodes (FMPadding_Batch
, ConvolutionInputGenerator
). The CreateDataflowPartition
transform will partition your model in smaller sub-models, where each sub-model will consists of (exclusively) nodes that are either standard ONNX nodes or fpgadataflow
-type nodes (i.e. nodes that will in the end run on the FPGA). Since your network is residual and contains many of these regular ONNX node mixed with fpgadataflow
-type nodes, the partitioning becomes more complicated and breaks along the way somewhere.
To resolve this, I would first suggest to revisit the streamlining of your network, since I presume your target is to run the full network on the FPGA rather than partly. One trick to make this easier, is to add uniform quantizers at the end of both residual lanes in your custom network (before exporting it with Brevitas). In the third image you showed, this would result in having a MultiThreshold
node at the end of both lanes. These MultiThreshold
nodes are essentially what allows us to streamline away floating point operators by moving them around and absorbing them in those MultiThreshold
thresholds. By then calling transforms such as AbsorbAddIntoMultiThreshold and AbsorbMulIntoMultiThreshold, those floating point operators will be absorbed in the thresholds of the subsequent MultiThreshold
node.
This would remove the floating point operators you showed in the screenshots and bring you one step closer to full FPGA execution. Hope this helps you further and please let us know if you run into further issues!