Implementing GAIL

Practice project for stepping into DL research projects

Notes of GAIL (pending)

General Understanding

Intro / contribution
GAIL (Generative Adversarial Imitation Learning) proposed a new general framework (part 3) for solving a sort of Imitation learning problem that only offers a bundle of expert trajectories;
- It also instantiate an algorithm using a discriminator (part 4, 5) that act like GAN.

Experiment
- baseline:
  - [Behavior Cloning], [Feature Expectation Matching (FEM)], [Game-Theoretic Apprenticeship Learning (GTAL)]
- physic-based control tasks:
  
  [Cartpole, Acrobot, Mountain Car, HalfCheetah, Hopper, Walker, Ant, Humanoid, Reacher]
- result
  - out performs basslines
  - reach good results
Future Reading

⋯

Related Works

[IRL]
[GAN]
[TRPO]
[Andrew's Proj]

Math

⋯

Implementation

Prerequisites

TRPO
- [tutorial(Chinese)]
- used in GAIL when:
  - generating expert behavior
  - updating GAIL's policy
  - providing baseline of experiments
GAN

Key Info

TRPO: [tutorial]
$\mathop{\min}\limits_{\pi_\theta} \mathop{\max}\limits_{D_\omega} V(D_\omega, \pi_\theta) = \mathbb{E}\pi[log\pi()D(s,a)] + \mathbb{E}_{\pi_E}[log(1-D(s,a))] - \lambda H(\pi)$
* discriminator $D$: use **Adam** to to step on $\omega$ to increase object. * policy set $\Pi$: use **TRPO** to step on $\theta$ to decrease object.

Log (tech records)

Setup Envs
- install envs
  - spinningup, Mujoco,Gym, rendering packages, ⋯
  - PyCharm remote interpreter
  - X11, Jupyter, ⋯
- cross platform development settings
  - ⋯
Practice Implementing Skills
- requirements list:
  - "actor-critic"? (memory)
  - using Gym
    - setup & "SOP" of Gym env
  - learn PyTorch
    - optimizer, gradient bind/detach
    - nn utils
    - "coding fromat"
  - general coding format
    - code structure (parts, key functions, ⋯) for RL
    - common data structure to use
    - ⋯
- practice project: DQN (Cartpole)
  - DQN: [tutorial(Chinese)]
  - $L(\omega) = \mathbb{E}[(r + \gamma\max Q(s', a', \omega) - Q(s, a, \omega))^2]$
TRPO (Cartpole):
- basic understand for implementation:
  - $\mathop{\text{maximize}}\limits_{\theta} E_{s\pi_{\theta_{old}}, a_{\theta_{old}}}[\frac{\pi_\theta(a | s)}{\pi_{\theta_{old}}(a | s)} A_{\theta_{old}}]$ s.t. $E_{s\pi_{\theta_{old}}} [D_{KL}(\pi_{old}(\cdot | s) || \pi_\theta(\cdot | s))] \le \delta$
- implementation plan
  - ⋯
GAIL
- ⋯

Techs Learn/Review

TRPO
- KL-Divergence
- surrogate advantages
- Convex Optimization
- Huber loss
- review: Kernels
- review: Lagrangian duality
- review: Hessian
Linux
- review: what happens when sudo apt update and sudo apt upgrade?

How to use

Appendix: Cross-Platform(win-2-ubuntu) Development

The idea is to write code to run on Linux server to utilize its powerful device and develop and debug it on Windows working pc to leverage its convenience (PyCharm on Windows rather vim in bash)

This cross-platform development has majorly two requirements: sharing files & sharing running environments. In order to share files/code, you can use GitHub to do sync between Windows pc and Linux server, or mount Windows folder to Linux machine, or use PyCharm to open remote projects via SFTP/SSH. The latter two way is more convenient since it allows you to sync small changes without archive a new commit to GitHub server. For sharing environments, I just use PyCharm to run remote interpreters.

Another way of running code from a remote machine is using Jupyter: run Jupyter service on server and edit/test it on local (Windows) machine. BTW, newly released PyCharm also supports Jupyter, which means you can debug and set watch variables in Jupyter environments.

There is a tricky issue to run Gym remotely: how to render graphics generated by Linux machine on Windows machine. I am also exploring this. Currently, I find using Jupyter can render images/videos smoothly, using X11 to render graphics generated by WSL on Windows machine also works sometimes. Running remote(on Linux server) code that renders graphics with Gym using PyCharm will cause pyglet.canvas.xlib.NoSuchDisplayException: Cannot connect to "None" error, which is pretty annoying.

My current developing tools/envs are listed below for reference:

Environments:
- develop on: Windows_10
- local debug: WSL (Ubuntu_18.06)
- deploy: Ubuntu_16 server
Version Control: Git
- share project progress and update code
cross platform developing: PyCharm SFTP protocol support
- use remote interpreter to develop and test
- sync code
Develop & Test (context keeper): Jupyter Notebook
Graphics Rendering:
- X11: add remote display on WSL/Linux and render graphics on windows: [tutorial], [issue]
- Jupyter: use matplotlib.pyplot

ziangqin-stu/play_pg