reinforcement-learning-workspace

➤ Table of Contents

➤ 📝 About The Project
➤ 💾 Key Project File Description
➤ 🚀 Dependencies
➤ 🔨 Usage
➤ ☕ Buy me a coffee
➤ 📜 Credits
➤ License

➤ 📝 About The Project

This repository is my personal collection and demonstration of various deep reinforcement learning (DRL) algorithms, showcasing my grasp and application of advanced concepts in the field. Each model's directory provides richly commented code, designed to display not just the technical implementation but also my understanding of the strategic underpinnings of each algorithm.

➤ 💾 Key Project File Description

DQN (Deep Q-Networks)

The DQN directory implements the DQN algorithm. DQN extends Q-learning by using deep neural networks to approximate the Q-value function. The code includes network architecture, experience replay, and the epsilon-greedy strategy for action selection. It is primarily based on the paper Playing Atari with Deep Reinforcement Learning by Mnih et al, (2015).

DDPG (Deep Deterministic Policy Gradient)

The DDPG folder contains the implementation of DDPG, a policy gradient algorithm that uses a deterministic policy and operates over continuous action spaces. The folder manages network updates, policy learning, and the Ornstein-Uhlenbeck process for action exploration. The foundational paper is Continuous control with deep reinforcement learning by Lillicrap et al, (2016).

TD3 (Twin Delayed DDPG)

The TD3 file is used for the TD3 algorithm, an extension of DDPG that reduces function approximation error by using twin Q-networks and delayed policy updates. This approach is elaborated in the paper by Addressing Function Approximation Error in Actor-Critic Methods Fujimoto et al, (2018).

PPO (Proximal Policy Optimization)

The PPO fodler facilitates the implementation of PPO, which optimizes policy learning by maintaining a balance between exploration and exploitation using a clipped surrogate objective. The algorithm is detailed in the paper Proximal Policy Optimization Algorithms by Schulman et al, (2017).

MADDPG (Multi-Agent DDPG)

The MADDPG folder explores the MADDPG framework, designed for multi-agent environments. It extends DDPG by considering the actions of other agents in the environment, enhancing training stability and performance in cooperative or competitive scenarios. The key concepts are discussed in the paper Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments by Lowe et al, (2017).

MAPPO (Multi-Agent PPO)

The MAPPO folder implements MAPPO, adapting the robust single-agent PPO algorithm for multi-agent settings. This file includes adaptations for centralized training with decentralized execution, suitable for complex multi-agent scenarios. The approach is based on findings discussed in the paper The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games by Yu et al, (2022).

A3C (Asynchronous Advantage Actor-Critic)

COMING SOON

➤ 🚀 Dependencies

To model the algorithms I used the PyTorch framework only

➤ 🔨 Usage

The easiest way to get started with the deep reinforcement learning algorithms in this repository, is to set up a local development environment. Follow these steps to install and run the implementations:

Clone the repository:

   git clone https://github.com/i1Cps/reinforcement-learning-work.git
   cd reinforcement_learning_work

Create a virtual environment (optional but recommended):

    python3 -m venv env
    source env/bin/activate  # On Windows use `env\Scripts\activate` I think lol

Install the required dependencies:

    pip3 install -r requirements.txt

Run a specific algorithm (example with PPo):

    cd algorithms/ppo
    python3 main.py

Plot the results:

    cd data
    python3 plot.py

View graphs plots in:

algorithms/<specific-algorithm>/data/plots

For example

➤ ☕ Buy me a coffee

Whether you use this project, have learned something from it, or just like it, please consider supporting it by buying me a coffee, so I can dedicate more time on open-source projects like this (҂⌣̀_⌣́)

➤ 📜 Credits

Theo Moore-Calters

Special Thanks to:

➤ License

Licensed under MIT.