itsMyrto/CarRacing-v2-gymnasium

Mismatch in ratios calculation

Opened this issue · 0 comments

In PPO when the ratios are computed it has: epx(logits - log probs). Although it does not prevent the agent from learning, it should have the same type, either keep logits, or add a softmax layer in the actor network.