/banrinochoujou

これは 万里の長城 です〜

Primary LanguagePythonMIT LicenseMIT

万里の長城

これは 万里の長城 です〜

Here is a Pytorch implementation of the Reinforcement Learning Algorithms.

News: Currently doing some mujoco experiments with the DDPG algorithm.

I am tuning the DDPG algorithm on the swimmer environment currently. The newest codes are placed under the repository of Megvii Inc. https://github.com/megvii-rl/pytorch-gym

News: My implementation of the Bayesian methods in Q-Learning

I recently have done some simple experiments on the Bayesian methods in Q-Learning. My main ideas are borrowed from the following three papers,

I have tried the variance inference approach and the dropout approach on the CartPole, Acrobot and nChain environment. The results of different algorithms are shown in the following figures and tables. A more detailed report is placed in https://github.com/hzxsnczpku/banrinochoujou/blob/master/doc/hw.pdf.

nChain

N 20 30 50 80 100
Bayesian TS 20.0 14.05 50.0 80.0 80.15
Bayesian Dropout 19.05 30.00 45.10 76.05 80.20
DQN no noise 14.30 6.00 15.50 4.00 60.00
DQN ε-greedy 9.00 9.05 10.35 24.25 40.25

Classical Control

Basic Agents & Modules

Algorithms

Distributions

  • Discrete
  • DiagGaussian
  • DiagBeta

TO BE IMPLEMENTED

Algorithms

  • Q Learning with a Duel Structure
  • CEM
  • acktr
  • Distributional Q Learning
  • Feudal Network

Distributions

  • Gaussian
  • Dirichlet

Modules

  • VIME
  • ICM

How to Play

For example, run the following code to train a TRPO Agent under the MuJoCo HalfCheetah-v1 environment:

python main.py --env HalfCheetah-v1 --agent TRPO_Agent --use_mujoco_setting True --save_every 300

To get a more detailed overview of the parameters, run the following code:

python main.py -h
  • I have change the structure of the code, so the above instructions no longer works, an alternative one will soon be given.

Experiment Results

Below are some experimental results achieved by my baselines:

MuJoCo Benchmark




Atari Benchmark

Under construction...

Dependency

  • tabulate 0.7.7
  • scipy 0.19.0
  • pytorch 0.2.0
  • pyparsing 2.1.4
  • openai gym

References

Under Construction...