This platform can be implemented on both windows and ubuntu if you have installed python3. Currently, there is no ROS-related packages. It only requires pytorch, numpy, and opencv.
- Python: 3.8.10
- numpy: 1.22.2
- opencv: 4.7.0
- pytorch: 1.13.1 + cu117 (cpu only also works for our platform)
- matplotlib: 3.7.1
- windows: Anaconda3 is recommended because it is convenient to install and uninstall. One can install some extra packages (torch et.al) after installing anaconda3
- ubuntu: Ubuntu 20.04 is recommended because it has already integrated python3. However, anaconda3 is still required if your ubuntu version is lower than 20.04.
- The version of PyTorch depends on the device. One can choose CPU only or a specified CUDA version. We have tested our code using different versions of PyTorch and they all works.
Currently, this repository consists of algorithm, common, datasave, environment, and simulation five directories.
Algorithm
Algorithm includes some commonly used reinforcement learning algorithms.
The following table lists RL algorithms in the corresponding directories.
Directory | Algorithm | Description |
---|---|---|
actor_critic | A2C DDPG SAC TD3 |
---- |
policy_base | PPO DPPO DPPO2 |
---- ---- does not work |
value_base | DQN DoubleDQN DuelingDQN |
---- |
rl_base | ---- | Basic class that inherited by other algorithms |
Common
Common includes common_func.py and common_cls.py containing some basic functions.
The following table lists the contents of the two py files.
File | Description |
---|---|
common_cls.py | ReplayBuffer, RolloutBuffer, OUNoise, NeuralNetworks, etc |
common_func.py | basic mathematical functions, geometry operations, etc |
Datasave
Datasave saves networks trained by RL algorithms and some data files.
Environment
Environment contains some physical models, which are called 'environment' in RL.
The 'config' directory contains the **.xml file, the model description files of all environments.
The 'envs' directory covers the ODE of the physical environments.
The following table lists all the current environments.
Environment | Directory | Description |
---|---|---|
CartPole | ./CartPole/ | continuous, position and angle |
CartPoleAngleOnly | ./CartPole/ | continuous, just angle |
CartPoleAngleOnlyDiscrete | ./CartPole/ | discrete, just angle |
FlightAttitudeSimulator | ./FlightAttitudeSimulator/ | discrete |
FlightAttitudeSimulator2StateContinuous | ./FlightAttitudeSimulator/ | continuous, state are only theta and dtheta |
FlightAttitudeSimulatorContinuous | ./FlightAttitudeSimulator/ | continuous |
UAVHover | ./UAV/ | continuous, other files in ./UAV are not RL environments |
UGVBidirectional | ./UGV/ | continuous, the vehicle can move forward and backward |
UGVForward | ./UGV/ | continuous, the vehicle can only move forward |
UGVForwardDiscrete | ./UGV/ | discrete, the vehicle can only move forward |
UGVForwardObstacleContinuous | ./UGV/ | continuous, the vehicle needs to avoid obstacles |
UGVForwardObstacleDiscrete | ./UGV/ | discrete, the vehicle needs to avoid obstacles |
UGVForward_pid | ./UGV_PID/ | UGV forward with PID controller tuned by RL |
UGVBidirectional_pid | ./UGV_PID/ | UGV bidirectional with PID controller tuned by RL |
TwoLinkManipulator | ./RobotManipulators/ | continuous, full drive |
BallBalancer1D | ./RobotManipulators/ | continuous, 1D ball balanced by a manipulator |
Simulation |
Simulation is the place where we implement our simulation experiments,
which means, using different algorithms in different environments.
Currently, we have the following well-trained controllers:
A DDPG controller for
- FlightAttitudeSimulator
- UGVBidirectional (motion planner)
- UGVForward (motion planner)
- UGVForwardObstacleAvoidance (motion planner)
A DQN controller for
- FlightAttitudeSimulator
- SecondOrderIntegration
- SecondOrderIntegration_Discrete
A Dueling DQN controller for
- FlightAttitudeSimulator
A TD3 trajectory planner for:
- UGVForwardObstacleAvoidance
- CartPole
- CartPoleAngleOnly
- FlightAttitudeSimulator
- SecondOrderIntegration
- UGVForward_pid
A PPO controller for:
- CartPoleAngleOnly
- FlightAttitudeSimulator2State
- SecondOrderIntegration_Discrete
- UGVForward_pid
- UGVBidirectional_pid
- TwoLinkManipulator
A DPPO controller for:
- CartPoleAngleOnly
- CartPole
- FlightAttitudeSimulator2State
- SecondOrderIntegration
- UGVBidirectional_pid
- TwoLinkManipulator
- BallBalancer1D
All runnable scripts are in './simulation/'.
In 'DQN-4-Flight-Attitude-Simulator.py', set: (set TRAIN to be True if you want to train a new controller)
TRAIN = False
RETRAIN = False
TEST = not TRAIN
In command window:
cd simulation/DQN_based/
python3 DQN-4-Flight-Attitude-Simulator.py
The result should be similar to the following.
In 'DDPG-4-UGV-Forward-Obstacle.py', set: (set TRAIN to be True if you want to train a new motion planner)
TRAIN = False
RETRAIN = False
TEST = not TRAIN
In command window:
cd simulation/PG_based/
python DDPG-4-UGV-Forward-Obstacle.py
The result should be similar to the following.
The result should be similar to the following.
The result should be similar to the following.
The result should be similar to the following.
The result should be similar to the following.
- Add A2C
- Add A3C
- Add PPO
- Add DPPO
- Add D4PG
- Train controllers for CartPole
- Add some PPO demos
- Add some DPPO demos
- Add some A3C demos
- Modify UGV (add acceleration loop)
- Add a UAV regulator
- Add a UAV tracker
- Add a 2nd-order integration system
- Add a duel-joint robotic arm
- Add a 2nd-order cartpole (optional)
- Debug DPPO2
- Debug DQN-based algorithms (multi-action agents)