
PPO Not Converge for Pendulum-v0

I have been trying to train a agent for Pendulum-v0 with PPO but have been having a hard time to training it to convergence (i.e. the pendulum wouldn't stay up). The parameter I was using was:

python main.py \
    --env-name "Pendulum-v0" \
    --algo ppo \
    --use-gae \
    --lr 4e-4 \
    --clip-param 0.2 \
    --value-loss-coef 0.5 \
    --num-steps 128 \
    --num-mini-batch 32 \
    --log-interval 1 \
    --use-linear-lr-decay \
    --entropy-coef 0

I'm not sure whether I made any mistake or used some improper parameters. Could anyone help? Thanks!