これは 万里の長城 です〜
Here is a Pytorch implementation of the Reinforcement Learning Algorithms.
I am tuning the DDPG algorithm on the swimmer environment currently. The newest codes are placed under the repository of Megvii Inc. https://github.com/megvii-rl/pytorch-gym
I recently have done some simple experiments on the Bayesian methods in Q-Learning. My main ideas are borrowed from the following three papers,
- Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
- Weight uncertainty in neural networks
- An empirical evaluation of thompson sampling
I have tried the variance inference approach and the dropout approach on the CartPole, Acrobot and nChain environment. The results of different algorithms are shown in the following figures and tables. A more detailed report is placed in https://github.com/hzxsnczpku/banrinochoujou/blob/master/doc/hw.pdf.
N | 20 | 30 | 50 | 80 | 100 |
---|---|---|---|---|---|
Bayesian TS | 20.0 | 14.05 | 50.0 | 80.0 | 80.15 |
Bayesian Dropout | 19.05 | 30.00 | 45.10 | 76.05 | 80.20 |
DQN no noise | 14.30 | 6.00 | 15.50 | 4.00 | 60.00 |
DQN ε-greedy | 9.00 | 9.05 | 10.35 | 24.25 | 40.25 |
- Deep Q Learning
- Double Deep Q Learning
- Deep Q Learning with the Priorized Replay Memory
- Asynchronous Advantage Actor-Critic
- Trust Region Policy Optimization
- Proximal Policy Optimization
- clipped surrogate loss
- adapted surrogate loss
- Evolution Strategy
- Deep Deterministic Policy Gradient
- Discrete
- DiagGaussian
- DiagBeta
- Q Learning with a Duel Structure
- CEM
- acktr
- Distributional Q Learning
- Feudal Network
- Gaussian
- Dirichlet
- VIME
- ICM
For example, run the following code to train a TRPO Agent under the MuJoCo HalfCheetah-v1 environment:
python main.py --env HalfCheetah-v1 --agent TRPO_Agent --use_mujoco_setting True --save_every 300
To get a more detailed overview of the parameters, run the following code:
python main.py -h
- I have change the structure of the code, so the above instructions no longer works, an alternative one will soon be given.
Below are some experimental results achieved by my baselines:
Under construction...
- tabulate 0.7.7
- scipy 0.19.0
- pytorch 0.2.0
- pyparsing 2.1.4
- openai gym
Under Construction...