jsikyoon/dreamer-torch

Help with implementing the latest dreamerv2

sai-prasanna opened this issue · 2 comments

I have implemented dreamerv2's current tensorflow code here. Your code helped a lot for the parts of tf which can't be directly translated 1-1. I tried to keep the torch code as faithful to the tensorflow implementation as possible. The model starts training, but after 20k steps the return gradually drops instead of going up. I tried cheetah and cartpole_swingup environments upto 100k steps on few seeds. I also tried comparing all the curves with the tensorflow results, most of them looks similar. The world model, actor and critic loss goes down, grad norms are not crazy. I will try to attach some logs soon.

Hi, I'm new to the implementation of dreamerv2, and I'm thinking about using your code for futher research. Does the problem of agent return decreasing after 20k steps still exist?

I fixed it, it was a trivial but hard-to-detect bug where torch.Tensor didn't preserve dtype of numpy array leading to the inputs not being normalized while stepping through the policy. It should work now. Please test with few environments, I checked with cartpole.