/move37

This repository is related to https://www.theschool.ai/courses/move-37-course/ and https://github.com/colinskow/move37

Primary LanguagePython

move37

This repository is related to:

Homework_Assignment_Week2

This solution is based on:

How to run

cd Homework_Assignment_Week2
python DeterministicFrozenLake.py

Homework_Assignment_Week3

This solution is based on:

How to run

cd Homework_Assignment_Week3
python blackjack.py

python CliffWalking.py

Midterm Assignment: Make a Bipedal Robot Walk (Week5)

Task

https://www.theschool.ai/courses/move-37-course/lessons/midterm-assignment-make-a-bipedal-robot-walk/

"The midterm is to make a bipedal humanoid robot walk in a simulation.
You can use OpenAI Gym for the environment.
https://github.com/search?q=bipedal+gym
This link shows some potential solutions that you can use to help you when you build your own.

We’re looking for good documentation, readable code, and bonus points for using reinforcement learning in a novel way for this challenge."

Solution

This assigment has two solutions:

  • DQN
  • A2C

DQN

This solution is based on:

How to run
cd Homework_Assignment_Week5/dqn/
python dqn.py

A2C

This solution is based on:

Chapter 14 & Chapter 15:

  • Continuous Action Space, The Actor-Critic (A2C) method
  • Trust Regions – TRPO, PPO, and ACKTR
How to run
cd Homework_Assignment_Week5/a2c/
python 01_train_a2c.py --name bipedal --cuda
python 02_play.py --model saves/a2c-bipedal/<<your data file>>.dat --save 45

Homework_Assignment_Week6

This solution is based on:

Homework_Assignment_Week7

This solution is based on:

How to run

cd Homework_Assignment_Week7/FlappyBird/
python flappybird.py

How to run

cd Homework_Assignment_Week7/NeuroEvolution-Flappy-Bird/
python flappy.py

Homework_Assignment_Week8

Solution 1: REINFORCE

This solution is based on:

How to run

cd Homework_Assignment_Week8/Deep_reinforcement_learning_Course/
jupyter notebook Lunar\ Lander\ REINFORCE\ Monte\ Carlo\ Policy\ Gradients.ipynb

Solution 2: REINFORCE

This solution is based on:

How to run

cd Homework_Assignment_Week8/OpenAI_Gym_AI
python OpenAI.py -m train
python OpenAI.py -m test

Homework_Assignment_Week9

Re-implement A2C but in Tensorflow

This solution is based on:

Final Project (Multi Agent Research Project) (Week10)

Task

"Reproduce the Deep Deterministic Policy Gradients algorithm for a multi-agent particle environment.
The algorithm should learn how to get both agents to ‘tag’ each other."

Suggested readings

Solution - Tensorflow

This solution is based on:

How to run

  • To install, cd into the directory (Homework_Assignment_Week10/tensorflow1_multiagent) and type pip install -e .

  • Known dependencies: Python (3.5.4), OpenAI gym (0.9.5), numpy (1.13.1)

  • Additional installation instructions: https://github.com/openai/multiagent-particle-envs

cd Homework_Assignment_Week10/tensorflow1_multiagent

python3 ddpg_tag.py --env simple_tag_guided --experiment_prefix ./results/ddpg_1v1/

python3 ddpg_tag.py --env simple_tag_guided_1v2 --experiment_prefix ./results/ddpg_1v2/

python3 ddpg_tag.py --env simple_tag_guided_2v1 --experiment_prefix ./results/ddpg_2v1/

python3 ddpg_tag.py --env simple_tag_guided_2v2 --experiment_prefix ./results/ddpg_2v2/

python3 maddpg_tag.py --env simple_tag_guided --experiment_prefix ./results/maddpg_1v1/

python3 maddpg_tag.py --env simple_tag_guided_1v2 --experiment_prefix ./results/maddpg_1v2/

python3 maddpg_tag.py --env simple_tag_guided_2v1 --experiment_prefix ./results/maddpg_2v1/

python3 maddpg_tag.py --env simple_tag_guided_2v2 --experiment_prefix ./results/maddpg_2v2/

Use --render parameter for rendering the process.

Other solutions

Please, use the solution above for testing.
Here are additional MADDPG solutions for particle environment written in Tensorflow and Pytorch.

Tensorflow 2

Homework_Assignment_Week10/tensorflow2_maddpg

This solution is based on:

PyTorch 1

Homework_Assignment_Week10/pytorch1_maddpg

This solution is based on:

PyTorch 2

Homework_Assignment_Week10/pytorch2_maddpg

This solution is based on:

PyTorch 3

Homework_Assignment_Week10/pytorch3_maddpg_mpe

This solution is based on: