The default observation for each traffic signal agent is a vector:
obs = [phase_one_hot, min_green, lane_1_density, ..., lane_n_density, lane_1_queue, ..., lane_n_queue]
phase_one_hot
is a one-hot encoded vector indicating the current active green phasemin_green
is a binary variable indicating whether min_green seconds have already passed in the current phaselane_i_density
is the number of vehicles in incoming lane i dividided by the total capacity of the lanelane_i_queue
is the number of queued (speed below 0.1 m/s) vehicles in incoming lane i divided by the total capacity of the lane
You can define your own observation by implementing a class that inherits from ObservationFunction and passing it to the environment constructor.
Below are the predefined observation functions.
[r_turn_veh, s_turn_veh, l_turn_veh]
To achieve a generalized state representation, the state for each incoming road is composed of three elements. For roads with multiple lanes dedicated to left-turn, straight, or right-turn, the state of that specific lane is determined by the maximum number of vehicles present on it.
In observations.py, we provide three general observation functions with different lane sorting order:
DefaultObservationFunction
FullAttachObservationFunction
FullClockwiseObservationFunction
The action space is discrete. Every 'delta_time' seconds, each traffic signal agent can choose the next green phase configuration.
For the general traffic light control agent, the action space is set to |A| = 8 standard discrete actions. In different scenarios, some of these 8 actions may not be executable or may have different meanings.
E.g.1: In the standard 4-way single intersection there are 8 legal discrete actions, corresponding to the following green phase configurations:
E.g.2: In the standard 3-way single intersection among 8 discrete actions, only 3 of them are legal, corresponding to the following green phase configurations: Note that the phase 7 and phase 8 are of different meaning while still remain as similar as possible to the original standard phase.
E.g.3: In a 2-way single intersection, among 8 discrete actions, 6 of them are illegal and being masked during action selection, corresponding phase configurations is shown as:
Important: every time a phase change occurs, the next phase is preeceded by a yellow phase lasting yellow_time
seconds.
You can customize green phase in traffic_signal.py:
0 | 1 | 3 | 3 | 2 | 3 | 3 | 3 |
3 | 3 | 0 | 1 | 3 | 2 | 3 | 3 |
0 | 1 | 3 | 3 | 3 | 3 | 2 | 3 |
3 | 3 | 0 | 1 | 3 | 3 | 3 | 2 |
Phase Key of a Single Road:
- 0: Left turn
- 1: Straight way
- 2: Left turn & straight way phase
- 3: Stop phase
Each row in the table represents the phase settings for one of the four incoming roads. A single line depicts the entire phase for a traffic light signal (TLS).
You can customize action masks in observation_space method at observations.py like:
action_mask[0:4] = 0
The default reward function is the change in cumulative vehicle delay:
That is, the reward is how much the total delay (sum of the waiting times of all approaching vehicles) changed in relation to the previous time-step.
You can choose a different reward function (see the ones implemented
in TrafficSignal) with the
parameter reward_fn
in
the SumoEnvironment constructor.
It is also possible to implement your own reward function:
def my_reward_fn(traffic_signal):
return traffic_signal.get_average_speed()
env = SumoEnvironment(..., reward_fn=my_reward_fn)