Global problem::Shape disallows multiple runs with different workloads
gilbertmike opened this issue · 7 comments
The problem shape is currently a static global object. This is okay when there is only one workload. However, when multiple workloads need to be generated and evaluated, for example when automatically generating workloads from fused kernels, the old workload shape is still there.
I can think of two solutions:
- a workaround is to reinitialize
problem::Shape
inproblem::ParseWorkload
- as suggested in the comments within the code, make
problem::Shape
not a static global object.
The second approach is cleaner, but I need to read the code more to see how difficult that would be.
Agree that second approach is better. The solution is to pass the object through to anyone that needs it. It's simple in theory but may need some slogging through interfaces to create all the plumbing. We'll have to do it sooner or later, so this may be a good time.
I can take on this issue, if you want.
That would be great. Let me know after you finish your investigation and have thoughts on a plan of action.
There are a lot more problem::GetShape
call sites than I expected. I think currently, the best plan is to incrementally change the interface instead of doing it in one go. I think we can:
- Make shape a member of the
Workload
class. - For now, allow only one object of
Workload
type to be alive at any point in the program. This can easily be done with a static member counter. This will also allow us to report useful error when anotherWorkload
object is constructed before the destruction of the existing one. The current behavior is to just silently run with incorrect results. - This way, we can also properly reinitialize the shape on construction of a
Workload
. - Slowly rewrite interfaces to pass around reference to
Workload
where needed.
We still have to work out how to change the old interfaces, but we can solve the silent error and shape reinitialization part of the problem. Let me know what you think.
That incremental step makes sense to me. Please go ahead.
I'm continuing this issue for workload-propagation
branch issue reporting.
There are currently two issues:
- There are still a lot of global
GetShape
calls as default values for a lot of methods. These don't cause an immediate crash and actually behave properly as long as only the latest created workload is used. However, they should be removed in my opinion (a bunch of these default values are not used) or bound to the propagated workload instead of globalGetShape
. - Sometimes, a parsing issue occurs when giving
CompoundConfig
a YAML string. I traced the bug all the way to yaml-cpp inside a converter from string to integers. Strangely, the string has a proper value (e.g., a string "1" failed to convert to integer 1). I suspect a string encoding issue. Testing with console Python does not reproduce issue. Only occurs when using PyTimeloop within a Jupyter Notebook. - Instead of spending too much time on the previous issue, I would like to add methods to construct
CompoundConfig
so we can bind that directly to Python to eliminate the load-dump-load pattern PyTimeloop is using. I thought the YAML trick should just work but it's causing me a lot of tricky bugs. I also think this is a better solution fundamentally.