Code Organization

Question

Code Organization

Closed this issue 4 years ago · 5 comments

Thanks for your quick reply and suggestions.

For the floorplanning part, I'm not sure which parts you think are inappropriately coupled? Currently the floorplan class takes in a dataflow graph and returns a mapping from the slots to the vertices assigned to this slot. I think we could have multiple classes, each for a different floorplanning algorithm?

BTW, previously you mentioned that you formulate the problem in another non-linear model. I'm not sure this is a good idea. With an ILP model, we already run into a scalability issue, which I'm working on right now. If some slots are completely out, as in the F1 case, we could set the available resource to 0 to handle the situation. Thus I do not see a solid motivation in making the model more complex, maybe you could explain more?

I agree the user interface is a mess right now, though I don't have the bandwidth for that recently. Ideally, the users should only provide the target device and their source code and nothing else. Then we will parse the mapping from the source code to the RTL. On the other hand, if they want to hack with the tool (e.g. add additional board, specify extra floorplanning constraints), they are supposed to be somewhat familiar with the CAD flow and I believe clock regions are relatively basic concepts? Let me know if you disagree.

My plan is to leave the user interface aside for a while (3 months, before my next deadline), and if users (rare for now) run into problems I could provide direct help.

Best,
Licheng

Hi @Licheng-Guo ,

About the HLSParser package, it is nice to see that tapa frontend has been added. It is quite a good beginning.

About the Device package, my idea is to leave the concrete device information to plain-text files, e.g. yaml or csv. We may have further discussion about how to represent the 'branches' on the resource variations induced by composition of instantiations of IPs, e.g. DDR, NoC infrastructures, etc. What choices come to me at this early stage:

record different resource situation as you've done in the old code base, but this may be a problem when the compositions are of a terrifying number.

let users pick the correct 'branch' in the resource file. (then supporting on comments can be a must for the file format)

About the Opt package, the components are still heavily coupled with the details of the proposed floorplanning algorithm. I'd like to start a new git branch to get them decoupled as I've mentioned in the last few e-mails. Hopefully I will keep the interfaces consistent (one exception is the coordination of floorplanner and slots, as the organizaton of slots may vary a lot depending on the floorplanner implementation).

About the Flow package, the Floorplan (optional) setion in def help seems to be confusing. It is really a problem for both of us to allow user to make partial assignment with a tidy interface without knowledge about the floorplanning implementation details on naming conventions.

This may be a problem if users want to connect vhls and autobridge directly as a whole workflow. (Some manual coding is not too big a realistic problem at this stage though).

The naming convention (the CLOCKREGION stuff) is not quite obvious even after I have read the Device package. This can be a real problem as normally user won't dive into the Opt package for namings (though the floorplanner itself and the constraint stuffs are there).

Maybe either

expose the namings to somewhere more obvious, or

let users bind to slot graphs and leave us to manage the name mappings. (It is not a choice if the slot graph is dynamically generated during floorplanning though. Several round trips between abstract coordinates and slot-during-iteration is otherwise needed.)

By the way, discussing in a closed issue seems not to be a correct way using GitHub. Maybe start a new issue ;-P

It is nice to be on a consistent track with you!

Best,
Jianwen

Answer 1 · 2021-03-16T06:45:01.000Z

Hi Licheng, User interface is not of a high priority at present, let's put it aside. About the coupling, the dataflow graph contains vertical cuts and horizontal cuts, which may not fit other floorplanning algorithms. It can be better if we first generate an algorithm-independent representation and then leave the floorplanner to add additional information (via new classes). <del>On the other hand, the pipeliner still consumes the cuts, which maybe better to be extended to consume slot paths.</del> For my model, there can be: * Differently weighted connections through die/slots * NoCs introducing quite a different interconnection topology/distance-metrics among slots. It sounds a strike to me that you've met a scability problem. My plan was to test my model with AutoBridge as a framework. As my model consists of integer-valued parameters and binary-valued solutions, I am now considering to use a SMT solver for my model (but still embed in AutoBridge). Besides, I think it not quite ideal to keep splitting slots into equal parts when handling with strongly heterogeneous situations. We may discuss this further. Best, Jianwen

Answer 2 · 2021-03-16T07:11:09.000Z

About the coupling, the dataflow graph contains vertical cuts and horizontal cuts, which may not fit other floorplanning algorithms.

Those are obsolete code, I will remove them. Right now I believe the data flow graph is independent of the floorplanning algorithm.

On the other hand, the pipeliner still consumes the cuts, which maybe better to be extended to consume slot paths.

The distribution of pipeline registers should be handled in global routing?

For my model, there can be:

Differently weighted connections through die/slots

NoCs introducing quite a different interconnection topology/distance-metrics among slots.

Do you have a motivating example for your proposal? For the NoC case, I think every segment of your NoC could be viewed as a vertex?

It sounds a strike to me that you've met a scability problem. My plan was to test my model with AutoBridge as a framework. As my model consists of integer-valued parameters and binary-valued solutions, I am now considering to use a SMT solver for my model (but still embed in AutoBridge).

Gurobi kind of conceals the scalability problem because it is too powerful, but many do not have access to the commercial tool.

Besides, I think it not quite ideal to keep splitting slots into equal parts when handling with strongly heterogeneous situations.

Theoretically yes, but you may need actual examples to justify your motivation (if you are targeting a paper)

Answer 3 · 2021-03-16T08:15:55.000Z

To be frank, I am working on my undergraduate thesis. I really appreciate your advice on my proposal a lot as this is not part of AutoBridge (yet?).

Gurobi kind of conceals the scalability problem because it is too powerful, but many do not have access to the commercial tool.

I am now considering use z3 solver to handle my model (which only involves integers and binaries) in case gurobi is absent. However, there seems to be little existed work on using SMT for non-linear optimization. As I have not tested my model in real usage cases, I am not quite sure about the outcomes in the resulting running time in z3 (and even gurobi). I'd like to know how the size of the dataflow graph and the slots affect the running time with current AutoBridge code base.

Besides, I think it not quite ideal to keep splitting slots into equal parts when handling with strongly heterogeneous situations.

Theoretically yes, but you may need actual examples to justify your motivation (if you are targeting a paper)

For the time being I can only give a rough thought experiment by combining extremely unbalanced components (If you feel this make sense, I will look for real cases.):

It seems that present AutoBridge fits best when the sizes of the components are about the same while the exceptions are not quite larger than that 'average' size, e.g. the processing elements and the IO boundaries in systolic arrays as you have presented with AutoSA.
Assuming you have a large CPU connected with several small co-processors through FIFOs, which can still be viewed as 'dataflow graphs' we are handling. Keep splitting the grids may leave too few resources for the large core or constrain the small cores in too constraint an area at even the first cut.
The formulation of your iteratively-cut grid model requires that all the slot are cut. Keep cutting on the assigned slots yields potential resources shortage for the large core. In other words, even we stick to a grid model, we do not have to make all the slots in the grid equal, but an imbalanced grid does not fit your present optimization model.

The grid-based strategy handles distance metric and routing in a Manhattan way, that is more sensitive to the endpoints than the details of the path.

It functions fast and well as long as FPGA can be viewed as a discrete plane, which is the case for the coming years.
One problem, without touching the topology, is that the Manhattan distance-based optimization cannot handle strongly varied weights on certain die-crossing (i.e. more than a vertical-horizontal preference factor).
- I wonder whether this case deserves our concern. Here is such a situation: considering a column-organized FPGA where fabric columns are separated by different column combinations of BRAMs, URAMs, etc.

My major motivation for introducing my model is to allow a flexible distance metric and topology among slots.

In the case with a hardened but coarse NoC, distant dies may be connected through NoC links while nearby dies can still be considered to be connected with pipelined fabrics.
- Vertices can be regions surrounded by NoC access points split by die borders (I'm not sure about the terms :-P). Edges are NoC links or die-die paths.
- At a cost of enlarging the problem, routing can be handled simutaneously if we allow a slot to correspond to multiple mutual-exclusive vertices, each for a different routing choice. This may grow the problem in a combitorics manner though, which seems to be the most obvious obstruction if I want to make my model realistic.
- In this case, as latency insensitive as if connected through FIFOs, the weight of an NoC link can similar to that of a pipelined path between adjacent dies, yielding a metric which cannot be achieved by measuring the Manhattan distance from a grid.
- As my advisor has some interest on FPGA with NoC, this idea may be validated with some tool like VPR in the future, but this won't happen very soon.
I am also wondering what if the FPGA itself is in an imbalanced grid-like geometry. My thought is still immature :-(

Thanks a lot.

Answer 4 · 2021-03-16T14:27:52.000Z

Hi Licheng,

It seems that my participation in this code base has reached an end. I went over Opt package of the new code base, I must apologize that I heavily misunderstood the codes before. The in-develop code does yield a clean interface as we have discussed.

I noticed (from your comments) that GlobalRouting.py assumes a grid with equal rectangular slots. For the current code base, I think this assumption can be loosen to the degree that every slot has a unique 'UP', 'DOWN', 'LEFT', 'RIGHT' neighbor (i.e. aligned) respectively with no change on the code. The sizes of the slots do not matter as we are routing in a Manhattan way.

I am also reconsidering the relationship among die-crossing-awared floorplanning, routing and floorplanning-awared pipelining. The routing part is not obvious in the AutoBridge paper. The complexity seems to be concealed by the unweighted Mahattan viewpoint.

Wish that the new code base grows all smoothly. Thanks for your time and patience so far!

Best,
Jianwen

Answer 5 · 2021-03-17T05:03:28.000Z

You could develop your own floorplan algorithm for non-uniform slot sizes. The Slot, Dataflow Graph and Floorplan are all decoupled. Though you may want to first have some concrete cases rather than high-level intuitions to justify your motivation.

GlobalRouting is still under development. Should be similar to traditional grid routing.

For hardened NoC like Versal, using the hardened link may have a performance impact due to the limited width of the bus. Taking performance into consideration seems very complicated and is beyond my plan. But it is an interesting problem worth thinking about.