/ReinforcementLearning

This is a platform we can use to finish some simple RL demonstration. Team members will modify it step by step~~Hold on! Aoligei!

Primary LanguagePython

Installation

This platform can be implemented on both windows and ubuntu if you have installed python3. Currently, there is no ROS-related packages. It only requires pytorch, numpy, and opencv.

Package version

  • Python: 3.8.10
  • numpy: 1.22.2
  • opencv: 4.7.0
  • pytorch: 1.13.1 + cu117 (cpu only also works for our platform)
  • matplotlib: 3.7.1

Installation

  • windows: Anaconda3 is recommended because it is convenient to install and uninstall. One can install some extra packages (torch et.al) after installing anaconda3
  • ubuntu: Ubuntu 20.04 is recommended because it has already integrated python3. However, anaconda3 is still required if your ubuntu version is lower than 20.04.
  • The version of PyTorch depends on the device. One can choose CPU only or a specified CUDA version. We have tested our code using different versions of PyTorch and they all works.

ReinforcementLearning

Currently, this repository consists of algorithm, common, datasave, environment, and simulation five directories.

Algorithm

Algorithm includes some commonly used reinforcement learning algorithms.
The following table lists RL algorithms in the corresponding directories.
Directory Algorithm Description
actor_critic A2C
DDPG
SAC
TD3
----
policy_base PPO
DPPO
DPPO2
----
----
does not work
value_base DQN
DoubleDQN
DuelingDQN
----
rl_base ---- Basic class that inherited
by other algorithms

Common

Common includes common_func.py and common_cls.py containing some basic functions.
The following table lists the contents of the two py files.
File Description
common_cls.py ReplayBuffer, RolloutBuffer, OUNoise, NeuralNetworks, etc
common_func.py basic mathematical functions, geometry operations, etc

Datasave

Datasave saves networks trained by RL algorithms and some data files.

Environment

Environment contains some physical models, which are called 'environment' in RL.
The 'config' directory contains the **.xml file, the model description files of all environments.
The 'envs' directory covers the ODE of the physical environments.
The following table lists all the current environments.
Environment Directory Description
CartPole ./CartPole/ continuous, position and angle
CartPoleAngleOnly ./CartPole/ continuous, just angle
CartPoleAngleOnlyDiscrete ./CartPole/ discrete, just angle
FlightAttitudeSimulator ./FlightAttitudeSimulator/ discrete
FlightAttitudeSimulator2StateContinuous ./FlightAttitudeSimulator/ continuous, state are only theta and dtheta
FlightAttitudeSimulatorContinuous ./FlightAttitudeSimulator/ continuous
UAVHover ./UAV/ continuous, other files in ./UAV are not RL environments
UGVBidirectional ./UGV/ continuous, the vehicle can move forward and backward
UGVForward ./UGV/ continuous, the vehicle can only move forward
UGVForwardDiscrete ./UGV/ discrete, the vehicle can only move forward
UGVForwardObstacleContinuous ./UGV/ continuous, the vehicle needs to avoid obstacles
UGVForwardObstacleDiscrete ./UGV/ discrete, the vehicle needs to avoid obstacles
UGVForward_pid ./UGV_PID/ UGV forward with PID controller tuned by RL
UGVBidirectional_pid ./UGV_PID/ UGV bidirectional with PID controller tuned by RL
TwoLinkManipulator ./RobotManipulators/ continuous, full drive
BallBalancer1D ./RobotManipulators/ continuous, 1D ball balanced by a manipulator
Simulation
Simulation is the place where we implement our simulation experiments,
which means, using different algorithms in different environments.

Demos

Currently, we have the following well-trained controllers:

DDPG

A DDPG controller for

  • FlightAttitudeSimulator
  • UGVBidirectional (motion planner)
  • UGVForward (motion planner)
  • UGVForwardObstacleAvoidance (motion planner)

DQN

A DQN controller for

  • FlightAttitudeSimulator
  • SecondOrderIntegration
  • SecondOrderIntegration_Discrete

A Dueling DQN controller for

  • FlightAttitudeSimulator

TD3

A TD3 trajectory planner for:

  • UGVForwardObstacleAvoidance
  • CartPole
  • CartPoleAngleOnly
  • FlightAttitudeSimulator
  • SecondOrderIntegration
  • UGVForward_pid

PPO

A PPO controller for:

  • CartPoleAngleOnly
  • FlightAttitudeSimulator2State
  • SecondOrderIntegration_Discrete
  • UGVForward_pid
  • UGVBidirectional_pid
  • TwoLinkManipulator

DPPO

A DPPO controller for:

  • CartPoleAngleOnly
  • CartPole
  • FlightAttitudeSimulator2State
  • SecondOrderIntegration
  • UGVBidirectional_pid
  • TwoLinkManipulator
  • BallBalancer1D

Run the scripts

All runnable scripts are in './simulation/'.

A DQN controller for a flight attitude simulator.

In 'DQN-4-Flight-Attitude-Simulator.py', set: (set TRAIN to be True if you want to train a new controller)

 TRAIN = False
 RETRAIN = False
 TEST = not TRAIN

In command window:

cd simulation/DQN_based/
python3 DQN-4-Flight-Attitude-Simulator.py

The result should be similar to the following.

A DDPG motion planner which can avoid obstacles for a forward-only UGV.

In 'DDPG-4-UGV-Forward-Obstacle.py', set: (set TRAIN to be True if you want to train a new motion planner)

 TRAIN = False
 RETRAIN = False
 TEST = not TRAIN

In command window:

cd simulation/PG_based/
python DDPG-4-UGV-Forward-Obstacle.py

The result should be similar to the following.

A DPPO controller for SecondOrderIntegration system.

The result should be similar to the following.

A PPO controller for TwoLinkManipulator system

The result should be similar to the following.

A DPPO controller for CartPole system with both position and angle

The result should be similar to the following.

A DPPO controller for BallBalancer1D system

The result should be similar to the following.

TODO

Algorithms

  • Add A2C
  • Add A3C
  • Add PPO
  • Add DPPO
  • Add D4PG

Demo

  • Train controllers for CartPole
  • Add some PPO demos
  • Add some DPPO demos
  • Add some A3C demos

Environments

  • Modify UGV (add acceleration loop)
  • Add a UAV regulator
  • Add a UAV tracker
  • Add a 2nd-order integration system
  • Add a duel-joint robotic arm
  • Add a 2nd-order cartpole (optional)

Debug

  • Debug DPPO2
  • Debug DQN-based algorithms (multi-action agents)