Non-distributed training

Question

Non-distributed training

GP413413 opened this issue 7 months ago · 4 comments

Hello! I found that the default setting for this project is distributed training, using the ray framework. I want to know if GRF_MARL can support non-distributed training (only on one server) and where to modify the settings. Thank you for your help!

Answer 1 · 2024-06-11T07:03:18.000Z

Hi, you can modify the config file for customized rollout and training. For example, ${rollout_manager.num_workers} is the CPU cores for environment rollout (one env instance on each CPU by default), and ${training_manager.num_trainers} is equal to the number of GPUs used for training.

Also feel free to look at a ray-free demo: light_malib/scripts/run_train.py .

Answer 2 · 2024-06-11T08:35:13.000Z

Thank you for your advice! Do you mean training with light_malib/scripts/run_train.py is another way of light_malib/main_pbt.py without using ray framework? I have tried to run an experiment by:
python3 light_malib/scripts/run_train.py --config expr_configs/cooperative_MARL_benchmark/full_game/11_vs_11_hard/ippo.yaml
But I got the following results immediately:

Where shoud I modify the config file?

Answer 3 · 2024-06-12T10:04:27.000Z

Yes, this is a way to debug without any distributed execution.

The config file is located at expr_configs/cooperative_MARL_benchmark/full_game/11_vs_11_hard/ippo.yaml. Can try modifying ${rollout_manager.num_workers} and ${training_manager.num_trainers} under distributed execution mode (e.g. running main_pbt.py)

Answer 4 · 2024-06-12T11:30:16.000Z

Ok, thank you for your advice. I will have a try.