/ReinforcementLearning

This is a platform we can use to finish some simple RL demonstration. Team members will modify it step by step~~Hold on! Aoligei!

Primary LanguagePython

Installation

This platform can be implemented on both windows and ubuntu if you have installed python3. Currently, there is no ROS-related packages. It only requires pytorch, numpy, and opencv.

Python version

Python higher than 3.8 is recommended.

Installation

Pre-installed: Anaconda3, any version that has a default python higher than 3.8 is fine.

pip install opencv-python
pip install torch==1.10.2+cu113 torchvision==0.11.3+cu113 torchaudio===0.10.2+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

The version of PyTorch depends on the device you have. You can choose CPU only or a specified CUDA version according to your GPU.

ReinforcementLearning

Currently, this repository consists of algorithm, common, datasave, environment, and simulation five parts..

Algorithm

Algorithm includes some commonly used reinforcement learning algorithms.
The following table lists RL algorithms in the corresponding directories.
Directory Algorithm Description
actor_critic A2C
DDPG
SAC
TD3
----
policy_base PPO
DPPO
DPPO2
----
----
does not work
value_base DQN
DoubleDQN
DuelingDQN
----
rl_base ---- Basic class that inherited
by other algorithms

Common

Common includes common_func.py and common_cls.py containing some basic functions.
The following table lists the contents of the two py files.
File Description
common_cls.py ReplayBuffer, RolloutBuffer, OUNoise, NeuralNetworks, etc
common_func.py basic mathematical functions, geometry operations, etc

Datasave

Datasave saves networks trained by RL algorithms and some data files.

Environment

Environment contains some physical models, which are called 'environment' in RL.
The 'config' directory contains the **.xml file, the model description files of all environments.
The 'envs' directory covers the ODE of the physical environments.
The following table lists all the current environments.
Environment Directory Description
CartPole ./CartPole/ continuous, position and angle
CartPoleAngleOnly ./CartPole/ continuous, just angle
CartPoleAngleOnlyDiscrete ./CartPole/ discrete, just angle
FlightAttitudeSimulator ./FlightAttitudeSimulator/ discrete
FlightAttitudeSimulator2StateContinuous ./FlightAttitudeSimulator/ continuous, state are only theta and dtheta
FlightAttitudeSimulatorContinuous ./FlightAttitudeSimulator/ continuous
UAVHover ./UAV/ continuous, other files in ./UAV are not RL environments
UGVBidirectional ./UGV/ continuous, the vehicle can move forward and backward
UGVForward ./UGV/ continuous, the vehicle can only move forward
UGVForwardDiscrete ./UGV/ discrete, the vehicle can only move forward
UGVForwardObstacleContinuous ./UGV/ continuous, the vehicle needs to avoid obstacles
UGVForwardObstacleDiscrete ./UGV/ discrete, the vehicle needs to avoid obstacles
UGVForward_pid ./UGV_PID/ UGV forward with PID controller tuned by RL
UGVBidirectional_pid ./UGV_PID/ UGV bidirectional with PID controller tuned by RL
TwoLinkManipulator ./RobotManipulators/ continuous, full drive

Simulation

Simulation is the place where we implement our simulation experiments,
which means, using different algorithms in different environments.

Demos

Currently, we have the following well-trained controllers:

DDPG

A DDPG controller for

  • FlightAttitudeSimulator
  • UGVBidirectional (motion planner)
  • UGVForward (motion planner)
  • UGVForwardObstacleAvoidance (motion planner)

DQN

A DQN controller for

  • FlightAttitudeSimulator
  • SecondOrderIntegration
  • SecondOrderIntegration_Discrete

A Dueling DQN controller for

  • FlightAttitudeSimulator

TD3

A TD3 trajectory planner for:

  • UGVForwardObstacleAvoidance
  • CartPole
  • CartPoleAngleOnly
  • FlightAttitudeSimulator
  • SecondOrderIntegration
  • UGVForward_pid

PPO

A PPO controller for:

  • CartPoleAngleOnly
  • FlightAttitudeSimulator2State
  • SecondOrderIntegration_Discrete
  • UGVForward_pid
  • UGVBidirectional_pid
  • TwoLinkManipulator

DPPO

A DPPO controller for:

  • CartPoleAngleOnly
  • CartPole
  • FlightAttitudeSimulator2State
  • SecondOrderIntegration
  • UGVBidirectional_pid
  • TwoLinkManipulator

Run the scripts

All runnable scripts are in './simulation/'.

A DQN controller for a flight attitude simulator.

In 'DQN-4-Flight-Attitude-Simulator.py', set: (set TRAIN to be True if you want to train a new controller)

 TRAIN = False
 RETRAIN = False
 TEST = not TRAIN

In command window:

cd simulation/DQN_based/
python3 DQN-4-Flight-Attitude-Simulator.py

The result should be similar to the following.

A DDPG motion planner which can avoid obstacles for a forward-only UGV.

In 'DDPG-4-UGV-Forward-Obstacle.py', set: (set TRAIN to be True if you want to train a new motion planner)

 TRAIN = False
 RETRAIN = False
 TEST = not TRAIN

In command window:

cd simulation/PG_based/
python DDPG-4-UGV-Forward-Obstacle.py

The result should be similar to the following.

A DPPO controller for SecondOrderIntegration system.

The result should be similar to the following.

A PPO controller for TwoLinkManipulator system

The result should be similar to the following.

A DPPO controller for CartPole system with both position and angle

The result should be similar to the following.

TODO

Algorithms

  • Add A2C
  • Add A3C
  • Add PPO
  • Add DPPO
  • Add D4PG

Demo

  • Train controllers for CartPole
  • Add some PPO demos
  • Add some DPPO demos
  • Add some A3C demos

Environments

  • Modify UGV (add acceleration loop)
  • Add a UAV regulator
  • Add a UAV tracker
  • Add a 2nd-order integration system
  • Add a duel-joint robotic arm
  • Add a 2nd-order cartpole (optional)

Debug

  • Debug DPPO2
  • Debug DQN-based algorithms (multi-action agents)