DHDev0/Stochastic-muzero

loss is nan from the beginning by default config

Junfeng-Huang opened this issue · 2 comments

Hello, the default configuration of this code is that when cartpole is run, loss is nan from the beginning. It should be the gradient problem in some link of the code. Is this a complete copy of the code, or are there bugs that haven't been fixed yet?

So I rebuild the docker and check, it look like some commit didn't sync. It's coming from the loss function which does not have the correct transform that clamp infinitely small probablility of an action to occur. They were suppose to be only two loss function general and game since they only use the Kullback–Leibler divergence.( no cross entropy )

I will push a commit with my version.

Thanks for your reply.