/drone-control-using-reinforcement-learning

Control drone with ppo in gym-pybullet-drones

Primary LanguagePythonMIT LicenseMIT

Control drone in gym-pybullet-drones using ppo

Hovering a quacopter with some predefined position using gym-pybullet-drones env with PPO algorithm from PPO-PyTorch

20/02/2023 Note about ppo implementation

  • Recently, I figure out the frustration of drone at hover position may come from fixed action_std of this PPO implementation, they setting action_std_init = 0.6 and decay this value during training time. In inference mode, there is no mechanism to reduce or remove this variance, so control output this vary all the time. I look at some other implementation of Soft Actor Critic, they use one more layer to learn action std beside action mean.

13/01/2023 Update hovering with some constrains

  • Add some contrains to naive reward, drone look more stable at hover position, reference from paper

30/12/2023 Update training result

28/12/2023 Init commit

  • Change reward function, compute terminate
Hover at (0, 0, 1) position

alt text

Hover at (0, 1, 1) position

alt text

How to use

  • Follow author's guide to install gym-pybullet-drones environment
  • Training python train_hover.py
  • Test pretrained model python test_hover.py

References