A question bout PPO implementation

Question

A question bout PPO implementation

Closed this issue 3 years ago · 0 comments

pengzhi1998 commented 3 years ago

Hi Dr. Yuan,

I successfully applied D3QN on a robotics navigation task using images. However, when I use your implemented PPO to deal with this task, it seems to not work with similar hyper-parameters as D3QN (like learning rate, gamma, network structure, and etc.).

Actually, your implementation is quite great and has shown amazing performance in multiple other tasks. And for this navigation task, the environment settings are the same as when using D3QN. Do you have any suggestions on this situation that PPO cannot be implemented on this specific task? What kinds of hyper-parameters and factors should I pay attention to?

I'm so sorry for this inconvenience and thank you for your time!