/base_reinforcement_learning

This is the code-base that I personally use as the starting point for any reinforcement learning codebase with the purpose of fast experimentation and analysis.

Primary LanguagePythonMIT LicenseMIT


Logo

Base Reinforcement Learning

A Reinforcement Learning project starter, designed for fast extension, experimentation, and analysis.
Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
  3. Quick Start
  4. Usage
  5. Roadmap
  6. Contributing
  7. License
  8. Contact

About The Project


There are many great Reinforcement Learning frameworks on GitHub. However, it is usually challenging to figure out how they can be extended for various purposes that come up in research. Keeping this in mind, this project has been developing by the following features in mind:

  • highly readable code and documentation that makes it easier to understand and extend different the project for various purposes
  • enables fast experimentation: it shouldn't take you more than 5 minutes to submit an experiment idea, no matter how big it is!
  • enables fast analysis: all the info necessary for anlaysing the experiments or debugging them should be readily avaiable to you!
  • help me improve my software engineering skills and understand reinforcement learning algorithms to a greater extent As Richard Feynman said: "What I cannot create, I do not understand."

All of this contributes to a single idea: you shouldn't spend too much time writing ''duplicated code'', instead you should be focused on generating ideas and evaluating them as fast as possible.

Getting Started


This section gets you through the requirements and installation of this library.

Prerequisites

All the prerequisites of this library are outlined in setup.cfg file.

Installation

To set up your project, you should follow these steps:

  1. Clone the project from GitHub:
    git clone https://github.com/erfanMhi/base_reinforcement_learning
  2. Navigate to the root directory:
    cd base_reinforcement_learning
  3. Use pip to install all the dependencies (recommend setting up a new VirtualEnv for this repo)
    pip install .
  4. To make sure that the installation is complete and the library works properly run the tests using
    pytest test

Quick Start


You can run a set of dqn experiments on CartPole environment by running:

python main.py --config-file experiments/data/configs/experiments_v0/online/cart_pole/dqn/sweep.yaml --verbose debug --workers 4

This experiment will tune the batch-size and memory_size of the replay buffer specified in the config file and returns the most performant parameters. It speeds up the experiments by using 4 parallel processes. The most performant parameters are stored in experiments/data/results/experiments_v0/online/cart_pole/dqn/sweep directory.

You can now easily analyze the experiments in tensorboard by running the following command:

tensorboard --log-dir experiments/data/results/experiments_v0/online/cart_pole/dqn/sweep

Doing so enables you to quickly analyse many parameters including, but not limited to:

  1. The learning curve of each algorithms.
learning-curve
2. The ditribution of weights in each layer has changed over time:
weights
3. All the details of the neural architecture and the flow of information through it:
architecture
4. Comparison of the performance of the algorithms based on different parameters:
hparams

Usage


To run experiments on this project, you only need to call main.py with the proper arguments:

python main.py main.py [-h] --config-file CONFIG_FILE [--gpu] --verbose VERBOSE [--workers WORKERS] [--run RUN]

The main file for running experiments

optional arguments:
  -h, --help            show this help message and exit
  --config-file CONFIG_FILE
                        Expect a json file describing the fixed and sweeping parameters
  --gpu                 Use GPU: if not specified, use CPU (Multi-GPU is not supported in this version)
  --verbose VERBOSE     Logging level: info or debug
  --workers WORKERS     Number of workers used to run the experiments. -1 means that the number of runs are going to be automatically determined
  --run RUN             Number of times that each algorithm needs to be evaluated

--gpu, --run, and --workers arguments don't require much explanation. I am going to throughly introduce the function of the remaining arguments.

Config-file

Let's breakdown --config-file argument first. --config-file requires you to specify the relative/absolute address of a config file. This config file can be in any data-interchange format. Currently, yaml files are only supported, but adding other formats like json is tirivial. An example of one of these config files are provided below:

config_class: DQNConfig

meta-params:
    log_dir: 'experiments/data/results/experiments_v0/online/cart_pole/dqn/best'
    algo_class: OnlineAlgo
    agent_class: DQNAgent
    env_class: CartPoleEnv

algo-params:

    discount: 0.99

    exploration:
        name: epsilon-greedy
        epsilon: 0.1

    model:
        name: fully-connected
        hidden_layers: 
                  grid-search: # searches through different number of layers and layer sizes for the fully-connected layer
                          - [16, 16, 16]
                          - [32, 32]
                          - [64, 64]
        activation: relu

    target_net:
        name: discrete
        update_frequency: 32

    optimizer: 
        name: adam
        lr:
          uniform-search: [0.0001, 0.001, 8] # searches over 8 random values between 0.0001 and 0.001 
    loss:
        name: mse

    # replay buffer parameters
    memory_size: 2500
    batch_size: 16

    # training parameters
    update_per_step: 1
    max_steps: 100000

    # logging parameters
    log_interval: 1000
    returns_queue_size: 100 # used to generate the learning curve and area under the curve of the reinforcement learning technique

In this file you specify the environment, the agent, and the algorithm you want to use to model the interaction between agent and the environment, along with their parameters. To tune the parameters, you can use different key-words such as uniform_search and grid_search which specify the search space of the Tuner class. Currently, Tuner class only supports the grid-search and random-search, however this class can be instantiated and is able to support much more operations. You almost have control over all of different parameters of your algorithm in this config file.

Verbose

Experiments can be run in two verbosity modes: info and debug. In the former, the process will only record the logs required to analyse the performance of the algorithm, such as the learning curve and the area under the curve. In the latter, all sorts of different values that can help us debug the algorithm will be logged, such as the histogram of weights in different layers of networks, the loss values in each step, the graph of the neural network to help us find the architectural bugs, etc.

(back to top)

Roadmap


  • Implementing and testing initial version of the code
  • Hardware Capability
    • multi-cpu execution
    • multi-gpu execution
  • Add value-based algorithms:
    • DQN implemented and tested
    • DDQN
  • Refactoring the code
    • Factory methods for Optimizers, Loss functions, Networks
    • Factory method for environments (requires slight changes of the configuration system)
  • Add run aggregator for enabling tensorboard to aggregate the results of multiple runs
  • Add RL algorithms for prediction:
    • TD(0) with General Value Functions (GVF)
  • Implement OfflineAlgorithm class
  • Implement Policy Gradient Algorithms
    • Vanila Policy Gradient
    • Vanila Actor-Critic
    • PPO
  • Reconfiguring the config files using Dependency Injection approach (most likely using Hydra)

See the open issues for a full list of proposed features (and known issues).

(back to top)

License

Distributed under the MIT License. See LICENSE.rst for more information.

(back to top)

Contact

Erfan Miahi - @your_twitter - mhi.erfan1@gmail.com

Project Link: https://github.com/erfanMhi/base_reinforcement_learning

(back to top)