python ddpg_model.py
- Apply fixed q target method to the model, that is, for both actor and critic model, set up two networks -- eval-net updates every step and target-net updates less times. This will fixed some trainging problems.
- Tranfer the network from keras to pytorch.
- Better actor and critic network
- LSTM to deal with time serie data
- Maybe some parallel computation, A3C (Asynchronous Advantage Actor-Critic) or maybe someting more than it.