- Implement DDPG ( Deep Deterministic Policy Gradient)
Game | Epochs | Training Time | Model Parameters |
---|---|---|---|
MountainCarContinuous-v0 | 1000 | 30 min | 299,032(total) |
Pendulum-v0 | 1000 | 30 min | 299,536(total) |
3DBall | willbeupdated | willbeupdated | willbeupdated |
- solve the problem that if epochs are over 200, then the action is converged in wrong direction.
- more games have to be tested.
- parser
- Save error and notation fixed
- argparser added
- replaybuffer.py's sampling method is changed.
- new test result added.
- pendulum-v0 is now testing.
- As epochs over 200, all(train and test) models are diverged.
- i tried to adjust batch size, learning-rate, activation function, model size, noise size but it is not cleared.
- it doesn't converged at all.
- i changed the training rate in Critic model at 0.001 to 0.0001(i have tried some points.)
- it shows that model can be trained well by adjusting the learning rate. i gain the idea from TRPO and PPO that the change of model of parameters is handled carefully.
python main.py
- If you want to change hyper-parameters, you can check "python main.py --help"
Options:
- '--epochs', type=int, default=100, help='number of epochs, (default: 100)'
- '--e', type=str, default='MountainCarContinuous-v0', help='environment name, (default: MountainCarContinuous-v0)' #- '--d', type=bool, default=False, help='train and test alternately. (default : False)'
- '--t', type=bool, default=True, help="True if training, False if test. (default: True)"
- '--r', type=bool, default=False, help='rendering the game environment. (default : False)'
- '--b', type=int, default=128, help='train batch size. (default : 128)'
- '--v', type=bool, default=False, help='verbose mode. (default : False)' #- '--n', type=bool, default=True, help='reward normalization. (default : True)'
- '--sp', type=int, default=True, help='save point. epochs // sp. (default : 100)'