loss is nan from the beginning by default config
Junfeng-Huang opened this issue · 2 comments
Junfeng-Huang commented
Hello, the default configuration of this code is that when cartpole is run, loss is nan from the beginning. It should be the gradient problem in some link of the code. Is this a complete copy of the code, or are there bugs that haven't been fixed yet?
DHDev0 commented
So I rebuild the docker and check, it look like some commit didn't sync. It's coming from the loss function which does not have the correct transform that clamp infinitely small probablility of an action to occur. They were suppose to be only two loss function general and game since they only use the Kullback–Leibler divergence.( no cross entropy )
I will push a commit with my version.
Junfeng-Huang commented
Thanks for your reply.