AllegroKukaLSTM PBT training effect difference

Question

AllegroKukaLSTM PBT training effect difference

Opened this issue 10 months ago · 0 comments

DannyChen1994 commented 10 months ago

I followed the documentation requirements，start one training session in a PBT experiment looks something like this:
$ python -m isaacgymenvs.train seed=-1 train.params.config.max_frames=10000000000 headless=True pbt=pbt_default pbt.workspace=workspace_allegro_kuka pbt.interval_steps=20000000 pbt.start_after=100000000 pbt.initial_delay=200000000 pbt.replace_fraction_worst=0.3 pbt/mutation=allegro_kuka_mutation task=AllegroKukaLSTM task/env=reorientation pbt.num_policies=8 pbt.policy_idx=0

And I learned that pbt.policy_idx=0 - this will start the agent #0. For the full PBT experiment we will have to start agents 0 .. pbt.num_policies-1. We can do it manually by executing 8 command lines with pbt.policy_idx=[0 .. 7] while taking care of GPU placement in a multi-GPU system via manipulating CUDA_VISIBLE_DEVICES for each agent.

My training platform is a 3090 with 24G video memory, which can open two command line windows to train synchronously pbt.policy_idx=0 and pbt.policy_idx=1. For the PBT algorithm, if I use 8 times in order to train pbt.policy_idx=[0 .. 7], or train in 4 times pbt.policy_idx=[0 1], [2 3], [4 5], [6 7], will the final result be different? Which effect is good?