This repository is related to:
This solution is based on:
cd Homework_Assignment_Week2
python DeterministicFrozenLake.py
This solution is based on:
cd Homework_Assignment_Week3
python blackjack.py
python CliffWalking.py
"The midterm is to make a bipedal humanoid robot walk in a simulation.
You can use OpenAI Gym for the environment.
https://github.com/search?q=bipedal+gym
This link shows some potential solutions that you can use to help you when you build your own.
We’re looking for good documentation, readable code, and bonus points for using reinforcement learning in a novel way for this challenge."
This assigment has two solutions:
- DQN
- A2C
This solution is based on:
cd Homework_Assignment_Week5/dqn/
python dqn.py
This solution is based on:
- Deep Reinforcement Learning Hands-On by Maxim Lapan
https://www.packtpub.com/big-data-and-business-intelligence/deep-reinforcement-learning-hands
Chapter 14 & Chapter 15:
- Continuous Action Space, The Actor-Critic (A2C) method
- Trust Regions – TRPO, PPO, and ACKTR
cd Homework_Assignment_Week5/a2c/
python 01_train_a2c.py --name bipedal --cuda
python 02_play.py --model saves/a2c-bipedal/<<your data file>>.dat --save 45
This solution is based on:
This solution is based on:
- https://github.com/ikergarcia1996/NeuroEvolution-Flappy-Bird
https://github.com/ikergarcia1996/NeuroEvolution-Flappy-Bird/blob/master/Jupyter%20Notebook/Flappy.ipynb - https://www.youtube.com/watch?v=h2Uhla6nLDU&feature=youtu.be
https://github.com/f-prime/FlappyBird/blob/master/flappybird.py
cd Homework_Assignment_Week7/FlappyBird/
python flappybird.py
cd Homework_Assignment_Week7/NeuroEvolution-Flappy-Bird/
python flappy.py
This solution is based on:
cd Homework_Assignment_Week8/Deep_reinforcement_learning_Course/
jupyter notebook Lunar\ Lander\ REINFORCE\ Monte\ Carlo\ Policy\ Gradients.ipynb
This solution is based on:
- https://leimao.github.io/article/REINFORCE-Policy-Gradient/
- https://github.com/leimao/OpenAI_Gym_AI/tree/master/LunarLander-v2/REINFORCE/2017-05-24-v1
cd Homework_Assignment_Week8/OpenAI_Gym_AI
python OpenAI.py -m train
python OpenAI.py -m test
This solution is based on:
"Reproduce the Deep Deterministic Policy Gradients algorithm for a multi-agent particle environment.
The algorithm should learn how to get both agents to ‘tag’ each other."
- Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
- Continuous control with deep reinforcement learning
This solution is based on:
- https://github.com/rohan-sawhney/multi-agent-rl
- https://github.com/openai/multiagent-particle-envs
- https://github.com/camellyx/10707-deep-learning-project
-
To install,
cd
into the directory (Homework_Assignment_Week10/tensorflow1_multiagent
) and typepip install -e .
-
Known dependencies: Python (3.5.4), OpenAI gym (0.9.5), numpy (1.13.1)
-
Additional installation instructions: https://github.com/openai/multiagent-particle-envs
cd Homework_Assignment_Week10/tensorflow1_multiagent
python3 ddpg_tag.py --env simple_tag_guided --experiment_prefix ./results/ddpg_1v1/
python3 ddpg_tag.py --env simple_tag_guided_1v2 --experiment_prefix ./results/ddpg_1v2/
python3 ddpg_tag.py --env simple_tag_guided_2v1 --experiment_prefix ./results/ddpg_2v1/
python3 ddpg_tag.py --env simple_tag_guided_2v2 --experiment_prefix ./results/ddpg_2v2/
python3 maddpg_tag.py --env simple_tag_guided --experiment_prefix ./results/maddpg_1v1/
python3 maddpg_tag.py --env simple_tag_guided_1v2 --experiment_prefix ./results/maddpg_1v2/
python3 maddpg_tag.py --env simple_tag_guided_2v1 --experiment_prefix ./results/maddpg_2v1/
python3 maddpg_tag.py --env simple_tag_guided_2v2 --experiment_prefix ./results/maddpg_2v2/
Use --render
parameter for rendering the process.
Please, use the solution above for testing.
Here are additional MADDPG solutions for particle environment written in Tensorflow and Pytorch.
Homework_Assignment_Week10/tensorflow2_maddpg
This solution is based on:
Homework_Assignment_Week10/pytorch1_maddpg
This solution is based on:
Homework_Assignment_Week10/pytorch2_maddpg
This solution is based on:
Homework_Assignment_Week10/pytorch3_maddpg_mpe
This solution is based on: