/AINE-DRL

AINE-DRL is a deep reinforcement learning (DRL) baseline framework. AINE means "Agent IN Environment".

Primary LanguagePythonMIT LicenseMIT

AINE-DRL

AINE-DRL is a deep reinforcement learning (DRL) baseline framework. AINE means "Agent IN Environment". If you want to know how to use, see AINE-DRL Documentation.

| Implementation | Experiments | Setup |

We always welcome your contributions! Please feel free to open an issue or pull request.

Implementation

AINE-DRL provides below things:

  • deep reinforcement learning agents
  • compatible with OpenAI Gym
  • compatible with Unity ML-Agents
  • inference (rendering, gif, picture)
  • model save/load
  • YAML configuration

If you're using AINE-DRL for the first time, please read Getting Started.

Agent

AINE-DRL provides deep reinforcement learning (DRL) agents. If you want to use them, it's helpful to read Agent docs.

Agent Source Code
REINFORCE reinforce
A2C a2c
Double DQN dqn
PPO ppo
Recurrent PPO ppo
PPO RND ppo
Recurrent PPO RND ppo

TODO

  • DDPG
  • Prioritized Experience Replay
  • SAC
  • Intrinsic Curiosity Module (ICM)
  • Random Network Distillation (RND)

Experiments

You can see our experiments (source code and result) in experiments. We show some recent experiments.

BipedalWalker-v3 with PPO

Train agents in OpenAI Gym BipedalWalker-v3 which is continuous action space task.

Fig 1. BipedalWalker-v3 inference (cumulative reward - PPO: 248):

To train the agent, enter the following command:

python experiments/bipedal_walker_v3/run.py

Detail options:

Usage:
    experiments/bipedal_walker_v3/run.py [options]

Options:
    -i --inference                Wheter to inference [default: False].

If paging file error happens, see Paging File Error.

CartPole-v1 with No Velocity

Compare Recurrent PPO (using LSTM) and Naive PPO in OpenAI Gym CartPole-v1 with No Velocity, which is Partially Observable Markov Decision Process (POMDP) setting. Specifically, we remove "cart velocity" and "pole velocity at tip" from the observation space. This experiment shows to require memory ability in POMDP setting.

Fig 2. CartPole-v1 with No Velocity inference (cumulative reward - Recurrent PPO: 500, Naive PPO: 41):

Recurrent PPO Naive PPO

Fig 3. CartPole-v1 with No Velocity cumulative reward (black: Recurrent PPO, cyan: Naive PPO):

To train the Recurrent PPO agent, enter the following command:

python experiments/cartpole_v1_no_velocity/run.py

Detail options:

Usage:
    experiments/cartpole_v1_no_velocity/run.py [options]

Options:
    -a --agent=<AGENT_NAME>       Agent name (recurrent_ppo, naive_ppo) [default: recurrent_ppo].
    -i --inference                Wheter to inference [default: False].

Setup

Follow the instructions.

Installation

This installation guide is simple. If you have a problem or want to see details, refer to Installation docs.

First, install Python 3.9 version.

If you want to use NVIDIA CUDA, install PyTorch with CDUA manually:

pip install torch==1.11.0+cu113 --extra-index-url https://download.pytorch.org/whl/cu113

Now, install AINE-DRL package by entering the command below:

pip install aine-drl

Run

Run a sample script in samples directory. Enter the following command:

python samples/<FILE_NAME>

Example:

python samples/cartpole_v1_ppo.py

See details in Getting Started docs.

Paging File Error

When you use too many workers (e.g., greater than 8), because of too many multi parallel environments in multi threads, "The paging file is too small for this operation to complete." error may happen. If it happens, you can mitigate it using the command (Windows):

pip install pefile
python fixNvPe.py --input=C:\<Anaconda3 Path>\envs\aine-drl\Lib\site-packages\torch\lib\*.dll

<Anaconda3 Path> is one in which your Anaconda3 is installed.

Reference: cobryan05/fixNvPe.py (Github)