MADDPG in Ray/RLlib

This implementation of MADDPG is recommended for research purposes only. If you want to actually learn something, use parameter sharing.

-This was forked from wsjeons's original repo due to lack of maintenance

The codes in OpenAI/MADDPG were refactored in RLlib, and test results are given in ./plots.
- It was tested on 7 scenarios of OpenAI/Multi-Agent Particle Environment (MPE).
  - simple, simple_adversary, simple_crypto, simple_push, simple_speaker_listener, simple_spread, simple_tag
    - RLlib MADDPG shows the similar performance as OpenAI MADDPG on 7 scenarios except simple_crypto.
- Hyperparameters were set to follow the original hyperparameter setting in OpenAI/MADDPG.
Empirically, removing lz4 makes running much faster. I guess this is due to the small-size observation in MPE.