shamanez/GAIL-with-WGAN-loss-for-the-Discriminator
This is about imitation learning using PPO and WGAN-GP loss. This is heavily influenced by GAIL-PPO repository in following link - https://github.com/uidilr/gail_ppo_tf. My agent will get converged to perform his task around 3384 iterations.
Python
Issues
- 0
tf.exp(critic_A) reward scheme in gail
#1 opened by YMBetta