AxiMaster overloads cocotb scheduler when using large AXI-ID vectors
benbr8 opened this issue · 6 comments
AxiMaster schedules ~2**(id_length+1)
coroutines upon its creation. This slows down simulation considerably.
at 17 bits width:
INFO cocotb:simulator.py:289 # 412.00ns INFO cocotb.regression regression.py:574 in _log_sim_summary *************************************************************************************
INFO cocotb:simulator.py:289 # ** ERRORS : 0 **
INFO cocotb:simulator.py:289 # *************************************************************************************
INFO cocotb:simulator.py:289 # ** SIM TIME : 412.00 NS **
INFO cocotb:simulator.py:289 # ** REAL TIME : 36.39 S **
INFO cocotb:simulator.py:289 # ** SIM / REAL TIME : 11.32 NS/S **
INFO cocotb:simulator.py:289 # *************************************************************************************
INFO cocotb:simulator.py:289 #
INFO cocotb:simulator.py:289 # 412.00ns INFO cocotb.regression regression.py:259 in tear_down Shutting down...
INFO cocotb:simulator.py:289 # Simulation halt requested by foreign interface.
- that's odd. I'm not sure why the scheduler would be doing anything for idle coroutines. I'll bring this up with the cocotb people. I can perhaps do some sort of lazy allocation of those resources - i.e. only spin up coroutines and allocate queues when IDs are active, and then perhaps delete them when the IDs become inactive. Lazy allocation could make sense, but if you do a sweep over the whole tag space, then you're right back where you started unless I also clean up inactive IDs, but then that's extra overhead if the problem is really in the cocotb scheduler.
- how do you get that call graph? That would be super useful for tweaking the interface extensions.
Oh, I know what I can do - make the mapping dynamic, with lazy allocation. That way, it will only spin up coroutines up to the number of unique active IDs at any one time, independent of the number of possible IDs. In that case, I won't clean up the queues and coroutines, just remove them from the mapping to be used for a different ID.
Oh, is this only a problem during init? After init, the simulation is fast?
Alright, try again with the git version and let me know if the performance is any different.
The most visible speed impact is at initialization and again at teardown, but I would also expect that having a large amount of dormant coroutines doesn't make the scheduler any faster ;).
The fix you provided works out great, thank you for the fast action!
The call graph is created with https://github.com/jrfonseca/gprof2dot
I just released v0.1.12 with this change