If I want to use the the meta-parameters to adapt to new task, what should I do?
Opened this issue · 2 comments
GeorgeDUT commented
I write a new environment (navigation on deterministic map):
(1) I run " python train.py --config xxxx", and get config.json, policy.th.
(2) I run "python test.py -config xxxx", and get results.npz.
But the rewards in results.npz are still very low.
What should I do to use policy.th to fast adapt to a new task?
tristandeleu commented
You should use --policy policy.th
in test.py
to use your trained policy.
That's surprising that you didn't get any error when running test.py
without --policy
, since this is a required parameter.
GeorgeDUT commented
I get it. I run test.py with policy.th. But the rewards of valid_return are equal to or even lower than train_return.
Maybe, our environment is not suitable. Thanks.