/Unity_ML_Agent

This repository is for Reinforcement Learning algorithms with Unity ML Agent

Primary LanguageC#

Unity Machine Learning Agent

Introduction

This is repository for Unity Machine Learning Agent and Reinforcement Learning(RL).

Unity release awesome tool for making reinforcement learning environment! Unity Machine Learning

pygame environment repository

Some of the environment made by purchased models. In this case, it is hard to provide unity codes. However, there are some simple environments which are made by simple or free model. I will provide unity codes for those environments.

Environment

Software

  • Windows10 (64bit), Ubuntu16.04
  • Python 3.6.5
  • Anaconda 4.2.0
  • Tensorflow-gpu 1.12.0
  • Unity version 2017.2.0f3 Personal
  • Unity ML-Agents: 0.8.1

Hardware

  • CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHZ

  • GPU: GeForce GTX 1080Ti

  • Memory: 8GB

Unity ML Environments

Deep Q Learning Based High Level Driving Policy Determination

Vehicle_Simulator_StaticObs

This project was one of the project of Machine Learning Camp Jeju 2017.

After the camp, simulator was changed to Unity ML-Agents.

Published papers using the environment

The Repository link of this project as follows.

Repository Link!!!


Vehicle Environment(Dynamic Obstacles)

Vehicle_Simulator_DynamicObs

The agent of this environment is vehicle. Obstacles are other 8 different kind of vehicles. If vehicle hits other vehicle, it gets minus reward and game restarts. If vehicle hits start, it gets plus reward and game goes on. The specific description of the environment is as follows.

- State: Game View (80x80x1 grayscale image)
- Action: 3 Actions (Left, Right, stay)
- Reward 
	- Driving at the center of the lane: +1 (linearly decrease)
	- Collide with other vehicles (-10)

Demo video: youtube link

Above demo, referenced papers to implement algorithm are as follows.


Vehicle Environment(Static Obstacles)

Vehicle_Simulator_StaticObs

This environment won ML-Agents Challenge!!! 👑

The agent of this environment is vehicle. Obstacles are static tire barriers. If vehicle hits obstacle, it gets minus reward and game restarts. If vehicle hits start, it gets plus reward and game goes on. The specific description of the environment is as follows.

Demo video: youtube link

Above demo, referenced papers to implement algorithm are as follows.


Breakout

FlappyBird

This is breakout environment, which is popular environment for testing RL algorithm.

The red bar (agent) moves left and right side to hit the ball. If the ball collides with a block, it breaks the block. In every episode, ball is fired in random direction.

The rules of the breakout are as follows.
- Visual Observation: 80x80x1 grayscale image  
- Actions: 3 actions (left, stay, right)

+1 Reward
- If ball breaks a block, the agent gets +1 reward 

-1 Reward
- If the agent misses the ball, the agent gets -1 reward

Terminal conditions
- If the agent misses the ball 
- If the agent breaks all the blocks 

Pong

Pong

This is simple and popular environment for testing deep reinforcement learning algorithms.

Two bars have to hit the ball to win the game. In my environment, left bar is agent and right bar is enemy. Enemy is invincible, so it can hit every ball. In every episode, ball is fired in random direction.

The rules of the pong are as follows.
- Visual Observation: 40x80x1 grayscale image 
- Vector Observation: 32 
- Actions: 3 actions (up, stay, right)

- Agent hits the ball: reward +0.5
- Agent misses the ball: reward -1

Terminal conditions
- Agent misses the ball
- After 1000 time steps, the episode ends

Predator Prey (MultiAgent)

PP

This is popular environment for testing Multi Agent deep reinforcement learning algorithms. The lions and a sheep are agents. Lions are predator, so they have to capture the sheep. Sheep is prey, so it has to run away from the lions.

Number of the Lions can be changed from 1 to 6 using the following python code

PP

The rules of the Predator Prey are as follows.
- Visual Observation: 80x80x3 image 
- Vector Observation: 3 -> x position, z position, role(0: Prey, 1: Predator) 
- Actions: 4 actions (up, down, left, right)

Reward (Predator)
- Every move: -0.01 
- Predator captures the prey: +1 

Reward (Prey)
- Every move: +0.01
- Predator captures the prey: -1

Terminal conditions
- Predator captures the prey 
- After 500 steps

Hopper

PP

This is popular environment for testing Continuous Action deep reinforcement learning algorithms. The agent has to move to the right side.

The rules of the Predator Prey are as follows.
- Vector Observation: 19 * 4 (19 data * 4 stacks) 
	- foot: x distance/50, local position (x,y,z), velocity (x, y), angular velocity (z)
	- leg1: local position (x,y,z), velocity (x,y), angular velocity (z)
	- leg2: local position (x,y,z), velocity (x,y), angular velocity (z)
- Actions: 3 Continuous actions (-1 ~ 1) -> Torque(foot, leg1, leg2)

Reward 
- y position of leg2 < 0.8: -1
- x potision of foot > 50: +1
- (0.01 * foot x velocity) + (0.00001 * foot x distance) 

Terminal conditions
- y position of leg2 < 0.8
- x potision of foot > 50