/rl_pro_telu

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Project_RL

Implementation of the Deep Deterministic Policy Gradient (DDPG) and Maximum A Posteriori Policy Optimization (MPO) Reinforcement Learning Algorithms for continuous control on OpenAI gym environments.

Prerequisites

To use the Algorithms will require python3 (>=3.6.5).

Usage Examples

Note

The algorithms are intended for continuous gym environments !

DDPG

  • Usage with provided noises (Ornstein Uhlenbeck, Adaptive Parameter)

    If you want to use DDPG with our pre-implemented noises you can simply do this via the main_ddpg.py file.

    • Training on the Qube-v0 (can be replaced with any environment id e.g. Levitation-v0) environment with default hyperparameters and Ornstein-Uhlenbeck noise, saving the model as furuta_model.pt and the log in furuta_model (saves by default as ddpg_model.pt, and log in a automatic generated directory). Saving/logging could be disabled with --no-save and --no-log

      python3 path/to/main_ddpg.py --env Qube-v0 --save_path furuta_model.pt --log_name furuta_log
    • Loading a saved model and evaluating it in the Qube-v0 environment. Number of episodes and their length for testing can be adapted with --eval_episodes and --eval_ep_length

      python3 path/to/main_ddpg.py --env Qube-v0 --no-train --eval --load furuta_model.pt
    • Training and evaluating are executed sequentially, so if u want to evaluate a just trained model, this will work:

      python3 path/to/main_ddpg.py --env Qube-v0 --train -eval --save_path furuta_model.pt --log_name furuta_log 
    • Loading a model and setting the train flag will continue training on the loaded model

      python3 path/to/main_ddpg.py --env Qube-v0 --train -eval --save_path furuta_model.pt --log_name furuta_log --load furuta_model.pt
    • Adapting hyperparameters is quite intuitive:

      python3 path/to/main_ddpg.py --env Qube-v0 --gamma 0.5 --tau 0.1 --batch_size 1024
    • For more information, e.g. about adaptable hyperparameters, use:

      python3 path/to/main_ddpg.py --help
  • Usage with self defined noise

    To use a self defined noise you will need to write a script by yourself.

    • Make sure the script is the same directory as the ddpg package (only if you didn't install them with PyPI).

    • The noise should extend the Noise class in noise.py (contain a reset and iteration function)

    • Following examples cover previous examples functionality

      import gym    
      import quanser_robots
      
      from ddpg import DDPG
      from ddpg import OrnsteinUhlenbeck
      
      # create environment and noise
      env = gym.make('Qube-v0')
      action_shape = env.action_space.shape[0] 
      noise = OrnsteinUhlenbeck(action_shape)
      
      # setup a DDPG model w.r.t. the environment and self defined noise
      model = DDPG(env, noise, save_path="furuta_model.pt", log_name="furuta_log")
      
      # load a model
      model.load_model("furuta_model.pt")
      #trains the model
      model.train()
      #evaluates the model and returns a meaned reward over all episodes
      mean_reward = model.eval(episodes=100, episode_length=500)
      print(mean_reward)
      
      # always close the environment when finished 
      env.cose()
    • Setting Hyperparameters would look something like this:

      model = DDPG(env, noise, gamma=0.5, tau=0.1, learning_rate=1e-2)
  • Using a model as a controller

    If you want to use your a model as a simple controller just call the model passing an observation:

    ctrl = DDPG(env, noise)
    ctrl.load('furuta_model.pt')
    
    while not done:
       env.render()
       act = ctrl(obs)
       obs, rwd, done, info = env.step(act)
    
    # always close the environment when finished
    env.close()

MPO

Using MPO is analogous to ddpg

  • Use main_mpo.py instead of main_ddpg.py

  • For information on the parameters you can set: python3 main_mpo.py --help

  • Due to no noise needed writing an own script is not necessary but for completeness a little code snippet as example:

    import gym
    import quanser_robots
    from mpo import MPO
    
    env = gym.make('Qube-v0')
    model = MPO(env, save_path="furuta_model.pt", log_name="furuta_log")
    # continues like DDPG example ...

Logging

By default logging is enabled and safes the logs in the runs/ directory. Name of the logs can be set by the log_name parameter (--log_name argument). Inspecting them works with:

tensorboard --logdir=*/PATH/TO/runs*

This starts a local server, which can be accessed in the browser. Connecting to the server should result in something like this:

tensorboar

Open Source Infos

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Authors

See also the list of contributors who participated in this project.

License

This project is licensed under the GNU GPL3 License - see the LICENSE file for details