Denys88/rl_games

rl_games won't load trained model with command line arguemnt `--checkpoint=` given

Closed this issue · 3 comments

Problem

I am using rl-games with IsaacGym to train my RL agent. However, when I was trying to use the --checkpoint= command line argument to resume the training, I found that the training always restarts from the very begining. I uses rl-games in the way below:

runner = Runner(algo_observer)
runner.load(cfg_train)
runner.reset()
runner.run(args)

and resume my training with command:

$ python ./rlg_train.py --task=[my_task_name] --checkpoint=[absolute path of trained model]

My Solution

I take a look at the source code, and found that the class method Runner.run_train(self) has a duplicated load_config() command.

else:
    self.reset()
    **self.load_config(self.default_config)**

This line causes the command line argument --checkpoint be covered by configurations in config file.

I thought that this command should be deleted, and another command should be added in Runner.run(self, arg) function:

if 'checkpoint' in args and args['checkpoint'] is not None:
    if len(args['checkpoint']) > 0:
        **self.load_check_point = True**
        self.load_path = args['checkpoint']

so that I can use command line argument to resume the training without modifying my config file.

Could you please take a look and check if I've gotten it right? Thanks a lot!

Hi @chaojie-fu.
What do you think If I just remove this option from config and leave one from command line only?
Looks like it is not convenient to update config anyway.

Removing this option from config file should be fine and for me, command line option is more convenient for testing.

#117 fixed here. I removed it from yaml file parser. so only --checkpoint works now.