tristandeleu/pytorch-maml-rl

If I want to use the the meta-parameters to adapt to new task, what should I do?

Opened this issue · 2 comments

I write a new environment (navigation on deterministic map):
(1) I run " python train.py --config xxxx", and get config.json, policy.th.
(2) I run "python test.py -config xxxx", and get results.npz.
But the rewards in results.npz are still very low.
What should I do to use policy.th to fast adapt to a new task?

You should use --policy policy.th in test.py to use your trained policy.
That's surprising that you didn't get any error when running test.py without --policy, since this is a required parameter.

I get it. I run test.py with policy.th. But the rewards of valid_return are equal to or even lower than train_return.
Maybe, our environment is not suitable. Thanks.