vwxyzjn/entity-ppo-demo

Entity-oriented PPO Demo

This is a demo of "entity-oriented" deep reinforcement learning (name is inspired by "object-oriented programming").

Feeling frustrated about not being able to describe your game's observation and action space using gym.spaces.Discrete or gym.spaces.Discrete for RL libraries? Entity-oriented RL comes to the rescue - it gives a much more intuitive observation and action space API to game practitioners. Specifically, Entity-oriented RL allows game practitioners to describe the entities in the game using JSON-like syntax such as follows:

entity-ppo-demo/mine_sweeper_demo.py

Lines 87 to 116 in 25d7f6e

    
           return Observation( 
        
               entities={ 
        
                   "Mine": ( 
        
                       self.mines, 
        
                       [("Mine", i) for i in range(len(self.mines))], 
        
                   ), 
        
                   "Robot": ( 
        
                       self.robots, 
        
                       [("Robot", i) for i in range(len(self.robots))], 
        
                   ), 
        
                   "Orbital Cannon": ( 
        
                       [(self.orbital_cannon_cooldown,)], 
        
                       [("Orbital Cannon", 0)], 
        
                   ) 
        
                   if self.orbital_cannon 
        
                   else None, 
        
               }, 
        
               actions={ 
        
                   "Move": CategoricalActionMask( 
        
                       # Allow all robots to move 
        
                       actor_types=["Robot"], 
        
                       mask=[self.valid_moves(x, y) for x, y in self.robots], 
        
                   ), 
        
                   "Fire Orbital Cannon": SelectEntityActionMask( 
        
                       # Only the Orbital Cannon can fire, but not if cooldown > 0 
        
                       actor_types=["Orbital Cannon"] if self.orbital_cannon_cooldown == 0 else [], 
        
                       # Both mines and robots can be fired at 
        
                       actee_types=["Mine", "Robot"], 
        
                   ), 
        
               },

The mine_sweeper_demo.py contains an end-to-end example. Please follow the steps below to get started.

Colab

To help get started, we have prepared a colab notebook here - https://colab.research.google.com/drive/1WM23R9TA-C6rmDTFAy4hbnB87C1hcmM2?usp=sharing#scrollTo=HGzNOrnSiaHF

Local Setup

poetry install
poetry run pip install torch==1.12.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
poetry run pip install torch-scatter -f https://data.pyg.org/whl/torch-1.12.1+cu113.html

Run an experiment

poetry run python mine_sweeper_demo.py env.id=MineSweeper total_timesteps=1000000
xvfb-run -a poetry run python microrts_demo.py env.id=GymMicrorts rollout.num_envs=16 total_timesteps=1000000 rollout.steps=256 eval.capture_videos=True eval.interval=300000 eval.steps=2000 eval.num_envs=1 eval.processes=1

The schema and documentation of configuration flags (such as total_timesteps) can be found here

Visualize metrics

tensorboard --logdir runs

Experiment tracking

poetry run python mine_sweeper_demo.py total_timesteps=1000000 track=true
xvfb-run -a poetry run python microrts_demo.py env.id=GymMicrorts rollout.num_envs=16 total_timesteps=1000000 rollout.steps=256 eval.capture_videos=True eval.interval=300000 eval.steps=2000 eval.num_envs=1 eval.processes=1 track=true

It is possible to track the experiments to weights and biases. To do so, please set track=true in the command line arguments. See the following tracked experiments:

More info

To learn more about how to formulate observation space and action space with entity-gym, see https://entity-gym.readthedocs.io/en/latest/quick-start-guide.html.

For more advanced training examples with more complex games, see https://github.com/entity-neural-network/enn-zoo

	return Observation(
	entities={
	"Mine": (
	self.mines,
	[("Mine", i) for i in range(len(self.mines))],
	),
	"Robot": (
	self.robots,
	[("Robot", i) for i in range(len(self.robots))],
	),
	"Orbital Cannon": (
	[(self.orbital_cannon_cooldown,)],
	[("Orbital Cannon", 0)],
	)
	if self.orbital_cannon
	else None,
	},
	actions={
	"Move": CategoricalActionMask(
	# Allow all robots to move
	actor_types=["Robot"],
	mask=[self.valid_moves(x, y) for x, y in self.robots],
	),
	"Fire Orbital Cannon": SelectEntityActionMask(
	# Only the Orbital Cannon can fire, but not if cooldown > 0
	actor_types=["Orbital Cannon"] if self.orbital_cannon_cooldown == 0 else [],
	# Both mines and robots can be fired at
	actee_types=["Mine", "Robot"],
	),
	},