Non-distributed training
GP413413 opened this issue · 4 comments
Hello! I found that the default setting for this project is distributed training, using the ray framework. I want to know if GRF_MARL can support non-distributed training (only on one server) and where to modify the settings. Thank you for your help!
Hi, you can modify the config file for customized rollout and training. For example, ${rollout_manager.num_workers}
is the CPU cores for environment rollout (one env instance on each CPU by default), and ${training_manager.num_trainers}
is equal to the number of GPUs used for training.
Also feel free to look at a ray-free demo: light_malib/scripts/run_train.py
.
Thank you for your advice! Do you mean training with light_malib/scripts/run_train.py is another way of light_malib/main_pbt.py without using ray framework? I have tried to run an experiment by:
python3 light_malib/scripts/run_train.py --config expr_configs/cooperative_MARL_benchmark/full_game/11_vs_11_hard/ippo.yaml
But I got the following results immediately:
Where shoud I modify the config file?
Yes, this is a way to debug without any distributed execution.
The config file is located at expr_configs/cooperative_MARL_benchmark/full_game/11_vs_11_hard/ippo.yaml
. Can try modifying ${rollout_manager.num_workers}
and ${training_manager.num_trainers}
under distributed execution mode (e.g. running main_pbt.py
)
Ok, thank you for your advice. I will have a try.