Minor Change in Runner to allow for parallel computing with tensorflow models and Pool
ML-IEE opened this issue · 2 comments
Is your feature request related to a problem? Please describe.
Parallel execution of the Episode Statistic has not been possible with tensorflow model/agents, due to some issues with multiprocessing.Pool
Now I have found a solution that might fix that problem without much change in the Grid2Op code base. Currently, the Pooling in the Runner (runner.py) works as follows:
with Pool(nb_process) as p:
tmp = p.starmap(_aux_one_process_parrallel, lists)
Describe the solution you'd like
Now I would propose the following change:
from multiprocessing import set_start_method,get_start_method,get_context
if get_start_method() == 'spawn':
with get_context("spawn").Pool(nb_process) as p:
tmp = p.starmap(_aux_one_process_parrallel, lists)
else:
with Pool(nb_process) as p:
tmp = p.starmap(_aux_one_process_parrallel, lists)
This will not change the normal execution when running the mutliprocessing.Pool approach, HOWEVER, If you specify the pooling context in your own custom code as follows:
def main():
"""
This is your own custom code:
"""
from multiprocessing import set_start_method, get_start_method
set_start_method("spawn")
# Now we call our own methods for scoring the agents
Then tensorflow models can be executed in parallel. This speeds up evaluation drastically and makes things a lot easier.
One thing for other researchers:
We wound that the read_from_local_dir should be called as well, else your nodes sometimes have a memory problem:
import grid2op
env_name = "l2rpn_case14_sandbox" # or any other name
env = grid2op.make(env_name, ...)
env.generate_classes()
env = grid2op.make(env_name,
experimental_read_from_local_dir=True)
Hello,
Thanks for the suggestion. Do not hesitate to write a PR when you have time :-)
I (hopefully) will able to review and add it in next grid2op release
Done and live in release in 1.10.3