/rl_gym_examples

Reinforcement Learning examples implemented in openai gymnasium environment using python

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

rl_gym_examples

cliff_walking_gif cartpole_gif
bipedal_walker_gif inverted_pendulum_gif

This repository contains examples of common Reinforcement Learning algorithms in openai gymnasium environment, using Python.

This repo records my implementation of RL algorithms while learning, and I hope it can help others learn and understand RL algorithms better.

✨Features

  • Document for each algorithm: Every folder has a README.md file to introduce the algorithm
  • Examples in OpenAI Gymnasium environments
  • Detailed comments

🚀Getting Started

📦Choose Version

Choose the version you want to use:

  • [Simple Implementation]: The simplest implementation of each algorithm, showing the core logic of the algorithm.

💻Prepare the Environment & Install Dependencies

  1. First, install the dependencies, you can install dependencies using conda or pip:

    • conda (recommended)

    Create a new conda environment using the yml file:

    conda create -f rl_gym_examples.yml
    • pip

    You can also install the dependencies using pip(though it is not recommended):

    pip install -r requirements.txt

    The Python version is 3.8.

  2. Then, you can run the examples in the corresponding folders, for example:

    cd dp
    python gym_cliff_walking.py

💡Tips

The pytorch in the dependencies is cpu version, you can install the gpu version by following the instructions in the pytorch website.

📚Supported Algorithms

RL Algorithm Development Path

Algorithm Observation Space Action Space Model-based or Model-free On-policy or Off-policy
Dynamic Programming(Policy Iteration or Value Iteration) Discrete Discrete Model-based NA
Sarsa Discrete Discrete Model-free on-policy
Q-learning Discrete Discrete Model-free off-policy
DQN Continuous Discrete Model-free off-policy
REINFORCE Continuous Discrete/Continuous Model-free on-policy
Actor-Critic Continuous Discrete/Continuous Model-free on-policy
TRPO/PPO Continuous Discrete/Continuous Model-free on-policy
DDPG Continuous Continuous Model-free off-policy
SAC Continuous Continuous Model-free off-policy

📁File Structure

  • 'dp': Dynamic Programming
  • 'td': Temporal Difference (TD) learning
  • 'dqn': Deep Q Network (DQN)
  • 'reinforce': REINFORCE algorithm(or Vanilla Policy Gradient)
  • 'actor_critic': Actor-Critic algorithm
  • 'ppo': Proximal Policy Optimization (PPO) algorithm
  • 'ddpg': Deep Deterministic Policy Gradient (DDPG) algorithm
  • 'sac': Soft Actor-Critic (SAC) algorithm

📝References