Hovering a quacopter with some predefined position using gym-pybullet-drones env with PPO algorithm from PPO-PyTorch
- Recently, I figure out the frustration of drone at hover position may come from fixed
action_std
of this PPO implementation, they settingaction_std_init = 0.6
and decay this value during training time. In inference mode, there is no mechanism to reduce or remove this variance, so control output this vary all the time. I look at some other implementation of Soft Actor Critic, they use one more layer to learn action std beside action mean.
- Change reward function, compute terminate
- Follow author's guide to install gym-pybullet-drones environment
- Training
python train_hover.py
- Test pretrained model
python test_hover.py
- https://github.com/utiasDSL/gym-pybullet-drones/
- https://github.com/nikhilbarhate99/PPO-PyTorch
- Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017).
- https://web.stanford.edu/class/aa228/reports/2019/final62.pdf