万里の長城

これは万里の長城です〜

Here is a Pytorch implementation of the Reinforcement Learning Algorithms.

News: Currently doing some mujoco experiments with the DDPG algorithm.

I am tuning the DDPG algorithm on the swimmer environment currently. The newest codes are placed under the repository of Megvii Inc. https://github.com/megvii-rl/pytorch-gym

News: My implementation of the Bayesian methods in Q-Learning

I recently have done some simple experiments on the Bayesian methods in Q-Learning. My main ideas are borrowed from the following three papers,

I have tried the variance inference approach and the dropout approach on the CartPole, Acrobot and nChain environment. The results of different algorithms are shown in the following figures and tables. A more detailed report is placed in https://github.com/hzxsnczpku/banrinochoujou/blob/master/doc/hw.pdf.

nChain

N	20	30	50	80	100
Bayesian TS	20.0	14.05	50.0	80.0	80.15
Bayesian Dropout	19.05	30.00	45.10	76.05	80.20
DQN no noise	14.30	6.00	15.50	4.00	60.00
DQN ε-greedy	9.00	9.05	10.35	24.25	40.25

Classical Control

Basic Agents & Modules

Algorithms

Deep Q Learning
Double Deep Q Learning
Deep Q Learning with the Priorized Replay Memory
Asynchronous Advantage Actor-Critic
Trust Region Policy Optimization
Proximal Policy Optimization
- clipped surrogate loss
- adapted surrogate loss
Evolution Strategy
Deep Deterministic Policy Gradient

Distributions

Discrete
DiagGaussian
DiagBeta

TO BE IMPLEMENTED

Algorithms

Q Learning with a Duel Structure
CEM
acktr
Distributional Q Learning
Feudal Network

Distributions

Gaussian
Dirichlet

Modules

VIME
ICM

How to Play

For example, run the following code to train a TRPO Agent under the MuJoCo HalfCheetah-v1 environment:

python main.py --env HalfCheetah-v1 --agent TRPO_Agent --use_mujoco_setting True --save_every 300

To get a more detailed overview of the parameters, run the following code:

python main.py -h

I have change the structure of the code, so the above instructions no longer works, an alternative one will soon be given.

Experiment Results

Below are some experimental results achieved by my baselines:

MuJoCo Benchmark

Atari Benchmark

Under construction...

Dependency

tabulate 0.7.7
scipy 0.19.0
pytorch 0.2.0
pyparsing 2.1.4
openai gym

References

Under Construction...