drone-control-using-reinforcement-learning: A Python repository from phuongboi

Hovering a quacopter with some predefined position using gym-pybullet-drones env with PPO algorithm from PPO-PyTorch

Recently, I figure out the frustration of drone at hover position may come from fixed action_std of this PPO implementation, they setting action_std_init = 0.6 and decay this value during training time. In inference mode, there is no mechanism to reduce or remove this variance, so control output this vary all the time. I look at some other implementation of Soft Actor Critic, they use one more layer to learn action std beside action mean.

Add some contrains to naive reward, drone look more stable at hover position, reference from paper

phuongboi/drone-control-using-reinforcement-learning