This is my code for experimenting with the CrowdAI Prosthetics Challenge (https://www.crowdai.org/challenges/nips-2018-ai-for-prosthetics-challenge)
The reinforcement learning codebase is based upon Ilya Kostrikov's awesome work (https://github.com/ikostrikov/pytorch-a2c-ppo-acktr)
As this is part of my learning process for continuous control with deep reinforcement learning, there are likely to be some issues.
All experiments were performed with PPO or PPO w/ self-improvement learning w/ 16 vector'd environments running in parallel. Keep in mind, the simulator is VERY slow so expect to wait a long time for decent results (days) -- even if you happen to have a kick ass machine.
Added:
- support for the OpenSim Gym-like environments with Ilya's RL codebase
- custom 'MyProstheticsEnv' wrapper to allow easier experimentation with different observation projections, rewards, and other aspects
- frame skipping support in custom env
- beta distribution experiment for continuous control in the range [0, 1] (http://ri.cmu.edu/wp-content/uploads/2017/06/thesis-Chou.pdf)
- tweaks to logging/folders/checkpoints and model resume for easier experimentation and tracking of results
- an implementation of SIL (https://arxiv.org/abs/1806.05635), one variant off policy replay with on policy methods. It speeds initial training but starts to falter. I need further experiments with loss weight and other sil param decay.
Setup your environment as per https://github.com/stanfordnmbl/osim-rl#getting-started
Unclipped -- trains much faster but not clear what OpenSim is doing:
main.py --algo ppo --env-name osim.Prosthetics --lr 7e-4 --num-steps 1000 --use-gae --ppo-epoch 10
With clipped [0, 1] actions shifted so mean is at 0.5:
main.py --algo ppo --env-name osim.Prosthetics --lr 1e-3 --num-steps 1000 --use-gae --ppo-epoch 10 --clip-action -shift-action
With beta distribution [0, 1]:
main.py --algo ppo --env-name osim.Prosthetics --lr 1e-3 --num-steps 1000 --use-gae --ppo-epoch 10 --beta-dist