/entity-ppo-demo

Primary LanguagePythonMIT LicenseMIT

Entity-oriented PPO Demo

This is a demo of "entity-oriented" deep reinforcement learning (name is inspired by "object-oriented programming").

Feeling frustrated about not being able to describe your game's observation and action space using gym.spaces.Discrete or gym.spaces.Discrete for RL libraries? Entity-oriented RL comes to the rescue - it gives a much more intuitive observation and action space API to game practitioners. Specifically, Entity-oriented RL allows game practitioners to describe the entities in the game using JSON-like syntax such as follows:

return Observation(
entities={
"Mine": (
self.mines,
[("Mine", i) for i in range(len(self.mines))],
),
"Robot": (
self.robots,
[("Robot", i) for i in range(len(self.robots))],
),
"Orbital Cannon": (
[(self.orbital_cannon_cooldown,)],
[("Orbital Cannon", 0)],
)
if self.orbital_cannon
else None,
},
actions={
"Move": CategoricalActionMask(
# Allow all robots to move
actor_types=["Robot"],
mask=[self.valid_moves(x, y) for x, y in self.robots],
),
"Fire Orbital Cannon": SelectEntityActionMask(
# Only the Orbital Cannon can fire, but not if cooldown > 0
actor_types=["Orbital Cannon"] if self.orbital_cannon_cooldown == 0 else [],
# Both mines and robots can be fired at
actee_types=["Mine", "Robot"],
),
},

The mine_sweeper_demo.py contains an end-to-end example. Please follow the steps below to get started.

Colab

To help get started, we have prepared a colab notebook here - https://colab.research.google.com/drive/1WM23R9TA-C6rmDTFAy4hbnB87C1hcmM2?usp=sharing#scrollTo=HGzNOrnSiaHF

Local Setup

poetry install
poetry run pip install torch==1.12.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
poetry run pip install torch-scatter -f https://data.pyg.org/whl/torch-1.12.1+cu113.html

Run an experiment

poetry run python mine_sweeper_demo.py env.id=MineSweeper total_timesteps=1000000
xvfb-run -a poetry run python microrts_demo.py env.id=GymMicrorts rollout.num_envs=16 total_timesteps=1000000 rollout.steps=256 eval.capture_videos=True eval.interval=300000 eval.steps=2000 eval.num_envs=1 eval.processes=1

The schema and documentation of configuration flags (such as total_timesteps) can be found here

Visualize metrics

tensorboard --logdir runs

Demo

Experiment tracking

poetry run python mine_sweeper_demo.py total_timesteps=1000000 track=true
xvfb-run -a poetry run python microrts_demo.py env.id=GymMicrorts rollout.num_envs=16 total_timesteps=1000000 rollout.steps=256 eval.capture_videos=True eval.interval=300000 eval.steps=2000 eval.num_envs=1 eval.processes=1 track=true

It is possible to track the experiments to weights and biases. To do so, please set track=true in the command line arguments. See the following tracked experiments:

More info

To learn more about how to formulate observation space and action space with entity-gym, see https://entity-gym.readthedocs.io/en/latest/quick-start-guide.html.

For more advanced training examples with more complex games, see https://github.com/entity-neural-network/enn-zoo