Error when loading agent weights

Question

Error when loading agent weights

Closed this issue 2 years ago · 5 comments

Hey,

first of all, thank you for the great work! I encountered your repo due to the IsaacGymEnvs and was training some Trifinger agents.
However, when trying to load the trained weights I'm getting the following error:

RuntimeError: Error(s) in loading state_dict for Network: Unexpected key(s) in state_dict: "value_mean_std.running_mean", "value_mean_std.running_var", "value_mean_std.count"

I'm running from the basically same repository and did not change any parameter in the config.

Answer 1 · 2022-02-19T06:04:31.000Z

Hi @BY571,

Thank you for using rl-games and IsaacGymEnvs. What version of rl-games did you use? Also + @ArthurAllshire

Answer 2 · 2022-02-20T03:28:57.000Z

Can you try installing rl-games 1.1.3?

Answer 3 · 2022-02-21T09:02:56.000Z

Thanks for the help! Indeed, on my remote machine, a different rl-games version got installed fixing the versioning solved the problem.

Answer 4 · 2022-02-22T11:53:53.000Z

The versioning seems to cause some troubles when testing. When I run with 1.3.1 the Trifinger robot does not move at all. However, when I load the same weights with version 1.1.3 the robot works and moves around lifting the cube. But I haven't found out yet what causes this problem.

Answer 5 · 2022-02-22T23:53:04.000Z

@BY571 I've found the issue.
IsaacGym never passed checkpoint to the command line. And i removed this option from the config because it added some mess.
The quickest solution is: to fix train.py in the isaacgym
runner.run({
'train': not cfg.test,
'play': cfg.test,
'checkpoint' : cfg.checkpoint
})

Also you cannot load weights from 1.1.3 to the 1.3.1 right now. If you need it and cannot retrain robot I can make a simple script for you.