/maddpg-keras

Implementation Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm in keras

Primary LanguagePythonMIT LicenseMIT

MADDPG KERAS Implementation

Implementation Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm in keras with very simple customization. Link to the paper https://arxiv.org/pdf/1706.02275.pdf

Previous version of code is available in v0.1 branch

Table of Contents

Project Description

  • This is implementation of maddpg algorithm in tensorflow keras and is easy to understand
  • maddpg implementation of openai is in tensorflow v1, hence making it difficult to understand for those who is accustomed to tensorflow v2 and keras
  • This is implementation is built-up on DDPG implementation on Keras Website, have a look at ddpg implementation as well
  • This repository is a good starting point for those looking to customize maddpg implementation

Features

  • This implementation has been succesfully tested for competetive environment of 2 pursuer and 1 evader problem

  • This implementation works for any (n) number of agents, which can be decided by user

  • To work with this implementation, user only needs to create a new env.py file, defining the environment

  • Here reward curve generated by this implementation 2 pursuer-1 evader envader after training for 3000 episodes maddpg

  • Also check the small animation generated by trained model (using this implementation) for 2 pursuer-1 evader envader environment

maddpg-keras.mp4
  • Impementation is very well documentated, and given easy implementation it is easy to understand
  • Author of code can be contacted directly on email(prshukla.edu@gmail.com) or linkedin in case of issue
  • Please note that GPU implementation is currently not supported, please contact author of the code if enhancement to add GPU support for training is needed
  • It takes around 20 hours to train 3 agents in 2 pursuer-1 evader environment for 3000 episodes (100 steps in each episode) on single i5-113G7 processor

Installation

For successful installation, use the given commands in terminal

git clone https://github.com/pr-shukla/maddpg-keras.git
cd maddpg-keras
pip install -r requirements.txt

Usage

  1. To train on the same 2 pursuer 1 evader competetive environment run the following command in root folder
python3 train.py
  1. You can create custom environment in env.py and then repeat step 1. env.py file should have following class and method
class Environment:
    def __init__(self):
        pass
    def initial_obs():
        '''
        Define initial observation state of your environment
        '''
    def step(self, action):
        '''
        Execute step and calculate new observation state
        '''

def reward(state):
    '''
    Calculate reward given new state
    '''
  1. You may want to change values of parameter like STD_DEV, GAMMA, TAU in config.py for custom environment
  2. To quickly see the result of previous training that author did, you can run predict.py as (trained models are saved in saved_models folder)
python3 predict.py

Code Structure

  • Code contains three directory: maddpg (contains code for maddpg implementation), env (contains training and prediction environment code), saved_models (contained pretrained models)
  • train.py: Main trianing code
  • config.py: Define training parameters like NUM_EPISODES, NUM_STEPS
  • predict.py: Code for testing trained model on prediction environment
  • \maddpg\buffer.py: 1. Calculates gradient and updates critic and actor models 2. Maintains buffer of experience
  • \maddpg\model.py: Creates neural network model for actor and critic model
  • \maddpg\noise.py: Creates random noise which added to predicted action for more exploration
  • \env\env.py: Training environment is defined here
  • \env\env_predict.py: Prediction/Testing environment is defined here
  • Please refer to algorithm while going through code maddpg_algo
  • Gradient calculation steps are extensively documented in Buffer.learn() method in buffer.py.

Possible Enhancements

Updates as of Dec 10, 2023

  • Training on GPU is not supported, contribution is welcomed to make this enhancement
  • Implementation has been tested for tensorflow version 2.3 and 2.8, recent versions may not work.
  • Currently time complexity of training is O(batch size), please look at implmentation in buffer.py for more details
  • Implementation works for agents performing single dimensional actions only, not for multi dimensional action
  • Actions are unnessarily calculated in Buffer.learn() methods in buffer.py. Search @bug in buffer.py for more details

How to Contribute

  • To make contribution simply create issue and raise pull request.

Support

  • For any support related to implementation, you can either raise an issue or could direectly shoot email at prshukla.edu@gmail.com

License

Licensed under MIT license.