Reniforcement Learning Platform and Sample Solutions for Robot Open Autonomous Racing(ROAR) Simulation Race
Contributors: Yunhao Cao, Tianlun Zhang, Franco Huang, Aman Saraf, Grace Zhang.
This repository is the implementation code of RL platform and a RL solution for ROAR Simulation Race based on ROAR_PY platform. The solution is inherited from ROAR-RL-Racer which runs on previous ROAR platform and you can find the technical blog here.
The current solution is trained and evaluated on the Monza Map.
Click here for a longer video (5 mins)
Please follow the setup tutorial here to install the dependencies.
Please download the latest maps from ROAR official website
The observation space provided to the agent involves:
-
All sensors attached to the vehicle instance:
- Basically the
RoarRLSimEnv
will take in an instance ofRoarPyActor
and the observation space of the environment will be a superset of thatRoarPyActor
'sget_gym_observation_spec()
- In ROAR's internal RL code, we added the following sensors to the actor:
- local coordinate velocimeter
- gyroscope (angular velocity sensor)
- Basically the
-
Waypoint Information Observation
Instead of inputting an entire occupancy map into the network, we directly feed numerical information about waypoints near the vehicle as the observation provided to the agent. This is how it works:
- During initialization of the environment we specify an array of relative distances (that is an array of floating point values) we want to trace for waypoint information observations
- Then in each step we perform trace one by one, and storing them inside
waypoints_information
key in the final observation dict.
A visualization of waypoint information is below, where the arrow represents the position and heading of the vehicle, the bule points represent centered waypoints, and the red points represent boundries.
The observation space active in our solution includes:
- Velocity Vector (3x1)
- IMU Vector (3x1)
- Waypoint Information (9x4)
For more details, please refer to Observation Space documentation
The action space of every RoarRLEnv
would be identical to the return value of RoarPyActor.get_action_spec()
.
The action space provided to the agent involves:
- throttle
- steering
- brake
- hand_brake
- reverse
For more details, please refer to Action Space documentation
The action space active in our solution includes:
- Throttle: Box(-1.0, 1.0, (1, ), float32)
- Steering: Box(-1.0, 1.0, (1, ), float32)
The reward function includes:
- rewards for traveled distance (to smooth out Q function)
- penalty for collisions
- penalty for distance away from the centered waypoints
For detailed rewards caculation, please refer to Reward Fucntion documentation
To run the training of our method, you need to:
- Modify wandb setup and other hyperparameters in
training/train_online.py
. - Move into training folder and run the training script.
cd training
python train_online.py
The models are stored under training/models
by default. After you have a trained model, you can go to training/eval_agent.py
to modify the model path and run this script for evaluation.
python training/eval_agent
- Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
- Soft Actor Critic for Discrete Action Settings
- Soft Actor-Critic Algorithms and Applications
- Proximal Policy Optimization Algorithms
- Playing Atari with Deep Reinforcement Learning
- The Bellman Error is a Poor Replacement for Value Error
- Correcting Robot Plans with Natural Language Feedback
- Variable Decision-Frequency Option Critic
- Decision Transformer - Reinforcement Learning via Sequence Modeling
- Subwords as Skills Tokenization for Sparse-Reward Reinforcement Learning
- Champion Level Drone Racing With DRL