Implementing GAIL
Practice project for stepping into DL research projects
Notes of GAIL (pending)
General Understanding
- Intro / contribution
- GAIL (Generative Adversarial Imitation Learning) proposed a new general framework (part 3) for solving a sort of Imitation learning problem that only offers a bundle of expert trajectories;
- It also instantiate an algorithm using a discriminator (part 4, 5) that act like GAN.
-
Experiment
-
baseline:
- [Behavior Cloning], [Feature Expectation Matching (FEM)], [Game-Theoretic Apprenticeship Learning (GTAL)]
-
physic-based control tasks:
[Cartpole, Acrobot, Mountain Car, HalfCheetah, Hopper, Walker, Ant, Humanoid, Reacher]
-
result
- out performs basslines
- reach good results
-
-
Future Reading
⋯
Related Works
- [IRL]
- [GAN]
- [TRPO]
- [Andrew's Proj]
Math
⋯
Implementation
Prerequisites
- TRPO
- [tutorial(Chinese)]
- used in GAIL when:
- generating expert behavior
- updating GAIL's policy
- providing baseline of experiments
- GAN
Key Info
- TRPO: [tutorial]
- $\mathop{\min}\limits_{\pi_\theta} \mathop{\max}\limits_{D_\omega} V(D_\omega, \pi_\theta) = \mathbb{E}\pi[log\pi()D(s,a)] + \mathbb{E}_{\pi_E}[log(1-D(s,a))] - \lambda H(\pi)$
-
* discriminator
$D$ : use **Adam** to to step on$\omega$ to increase object. * policy set$\Pi$ : use **TRPO** to step on$\theta$ to decrease object.
Log (tech records)
-
Setup Envs
- install envs
- spinningup, Mujoco,Gym, rendering packages, ⋯
- PyCharm remote interpreter
- X11, Jupyter, ⋯
- cross platform development settings
- ⋯
- install envs
-
Practice Implementing Skills
-
requirements list:
-
"actor-critic"? (memory)
-
using Gym
- setup & "SOP" of Gym
env
- setup & "SOP" of Gym
-
learn PyTorch
- optimizer, gradient bind/detach
- nn utils
- "coding fromat"
-
general coding format
- code structure (parts, key functions, ⋯) for RL
- common data structure to use
- ⋯
-
-
practice project: DQN (Cartpole)
- DQN: [tutorial(Chinese)]
$L(\omega) = \mathbb{E}[(r + \gamma\max Q(s', a', \omega) - Q(s, a, \omega))^2]$
-
-
TRPO (Cartpole):
- basic understand for implementation:
- implementation plan
- ⋯
-
GAIL
- ⋯
Techs Learn/Review
- TRPO
- KL-Divergence
- surrogate advantages
- Convex Optimization
- Huber loss
- review: Kernels
- review: Lagrangian duality
- review: Hessian
- Linux
- review: what happens when
sudo apt update
andsudo apt upgrade
?
- review: what happens when
How to use
Appendix: Cross-Platform(win-2-ubuntu) Development
The idea is to write code to run on Linux server to utilize its powerful device and develop and debug it on Windows working pc to leverage its convenience (PyCharm on Windows rather vim in bash)
This cross-platform development has majorly two requirements: sharing files & sharing running environments. In order to share files/code, you can use GitHub to do sync between Windows pc and Linux server, or mount Windows folder to Linux machine, or use PyCharm to open remote projects via SFTP/SSH. The latter two way is more convenient since it allows you to sync small changes without archive a new commit to GitHub server. For sharing environments, I just use PyCharm to run remote interpreters.
Another way of running code from a remote machine is using Jupyter: run Jupyter service on server and edit/test it on local (Windows) machine. BTW, newly released PyCharm also supports Jupyter, which means you can debug and set watch variables in Jupyter environments.
There is a tricky issue to run Gym remotely: how to render graphics generated by Linux machine on Windows machine. I am also exploring this. Currently, I find using Jupyter can render images/videos smoothly, using X11 to render graphics generated by WSL on Windows machine also works sometimes. Running remote(on Linux server) code that renders graphics with Gym using PyCharm will cause pyglet.canvas.xlib.NoSuchDisplayException: Cannot connect to "None"
error, which is pretty annoying.
My current developing tools/envs are listed below for reference:
- Environments:
- develop on:
Windows_10
- local debug:
WSL (Ubuntu_18.06)
- deploy:
Ubuntu_16 server
- develop on:
- Version Control: Git
- share project progress and update code
- cross platform developing:
PyCharm
SFTP protocol support- use remote interpreter to develop and test
- sync code
- Develop & Test (context keeper):
Jupyter Notebook
- Graphics Rendering: