PPO_LSTM_Car_Navigation

A car-agent leans to navigate in complex traffic conditions by PPO.
The agent is supposed to learn to choose accelerations to reach the destination avoiding causing jam or collision.

Traffic conditions && Collision Detection

When a car-agent navigates on the road, it may encounter with other cars.
In some conditions, the acceleration chosen by car-agent will cause jam or collision.
Since the condition will come very complex and the GAMA simulator has no idea about the collision so I have to make collision detection or jam detection.
Here will choose the closest 10 cars around the agent and calculate the distances.
These equations are neccessary. And here will use Euclidean distance for safe driving.
$S = v_{0}*t + \frac{1}{2}at^{2}$
$v_{n+1} = v_{n}+a_{n}t_{n}$

On the same road

First, the agent compute the useful distances (There will be distance of the behind car or distance of the front car).
And then detections will be executed after the agent choose acceleration to detecte whether the acceleration will cause jams or collisions.
A unit of time is 1-cycle.

Collision Detection

When there is an another car is in front of the car-agent when the two cars are on the same road, if
$EuclideanDistance&space;+&space;v_{car}*t&space;\leq&space;v_{agent}*t+\frac{1}{2}*a*t^{2}$
the acceleration will be supposed to cause collision with the front cars. (The front cars maybe more than one.)

Jam Detection

When there is an another car is behind of the car-agent when the two cars are on the same road, if
$EuclideanDistance + v_{agent}*t+\frac{1}{2}*a*t^{2} \leq v_{car}*t$
the acceleration will be supposed to cause jam with the behind cars. (The behind cars maybe more than one.)

Jam

On the different road

The calculation process is the same as the conditions on the same road.But the conditions become very complex.
The closest 10 cars will on the same road with the agnet?
If so, will the cars be the front of the agent or behind of the agent?
These conditions will be detected clear in the gaml file.

Station representation

[real_speed/10, target_speed/10, elapsed_time_ratio, distance_to_goal/100,distance_front_car/10,distance_behind_car/10]

Action representation

The network's output are accelerations which are constricted between [-5,8]m/s^2 to be closer to the real situations.

Reward shaping

Output acceleration. Action representation [acceleration]. The car will learn to control its acceleration with the restructions shown below:
Reward shaping:

rt = r terminal + r danger + r speed
r terminal： -0.013(target_speed > real_speed) or -0.1(target_speed < real_speed)：crash / time expires
r speed： related to the target speed
if sa ≤st:0.001 - 0.004*((target_speed-Instantaneous_speed)/target_speed);
if distance_front_car_before <= safe_interval or time_after_safe_interval>0:0.001*(Instantaneous_speed/target_speed);
Time_after_safe_interval can be extented when the front cars within safe_interval.
if sa > st: 0.001 - 0.006*((Instantaneous_speed-target_speed)/target_speed);

In my experiment it's obviously I desire the agent to learn controling its speed around the target-speed.

Result

It's obvoiusly that the LSTM can be trained much better than models without LSTM.

Actor-Ctitic 2 LSTM

Actor-Ctitic 0 LSTM

PPO2

$J^{\theta&space;'}(\theta&space;)&space;=&space;\sum&space;min(\frac{p_{\theta'&space;}}{p_{\theta&space;}}*A_{\theta&space;}(s_{t&space;},a_{t&space;})),clip(\frac{p_{\theta'&space;}}{p_{\theta&space;}},1-\varepsilon&space;,1+\varepsilon)*A_{\theta&space;}(s_{t&space;},a_{t&space;}))$

About GAMA

The GAMA is a platefrom to do simulations.
I have a GAMA-modle named "PPO_Mixedinput_Navigation.gaml", which is assigned a car and some traffic lights. The model will sent some data
[real_speed, target_speed, elapsed_time_ratio, distance_to_goal,reward,done,time_pass,over]
as a matrix to python environment, calculating the car's accelerate by A2C. Applying to the Markov Decision Process framework, the car in the GAMA will take up the acceleration and send the latest data to python over and over again until reaching the destination.

ZHONGJunjie86/PPO_LSTM_Car_Navigation