Continuous control with deep reinforcement learning

Experiments

Game	Epochs	Training Time	Model Parameters
MountainCarContinuous-v0	1000	30 min	299,032(total)
Pendulum-v0	1000	30 min	299,536(total)
3DBall	willbeupdated	willbeupdated	willbeupdated

solve the problem that if epochs are over 200, then the action is converged in wrong direction.
more games have to be tested.
parser

As epochs over 200, all(train and test) models are diverged.
- i tried to adjust batch size, learning-rate, activation function, model size, noise size but it is not cleared.

it doesn't converged at all.
- i tried almost same model maded by another people, it looks same i guess , but it looks converged. but my model didn't converged.

i changed the training rate in Critic model at 0.001 to 0.0001(i have tried some points.)
- it shows that model can be trained well by adjusting the learning rate. i gain the idea from TRPO and PPO that the change of model of parameters is handled carefully.

python main.py

Options:

'--epochs', type=int, default=100, help='number of epochs, (default: 100)'
'--e', type=str, default='MountainCarContinuous-v0', help='environment name, (default: MountainCarContinuous-v0)' #- '--d', type=bool, default=False, help='train and test alternately. (default : False)'
'--t', type=bool, default=True, help="True if training, False if test. (default: True)"
'--r', type=bool, default=False, help='rendering the game environment. (default : False)'
'--b', type=int, default=128, help='train batch size. (default : 128)'
'--v', type=bool, default=False, help='verbose mode. (default : False)' #- '--n', type=bool, default=True, help='reward normalization. (default : True)'
'--sp', type=int, default=True, help='save point. epochs // sp. (default : 100)'