The multi-agent suite is a collection of experiments that investigate the capabilities of learning in continuous games (also called differentiable or smooth games).
Experiments are defined in the experiments
folder. Each folder inside corresponds to one experiment and contains:
- A file that defines an environment, with a level of configurability (python file with the same name as the experiment).
- A sequence of keyword arguments for this environment, defined in
SETTINGS
variable in the experiment'ssweep.py
file - A file
analysis.py
that defines the plotting and analysis tools (WIP). Logging is done inside an environment, which allows for data to be output in the correct format regardless of the agent or algorithm structure.
The main way to use the package is through the command line or juptyer notebooks.
We recommend installing a virtual environment. If the below code does not work, try "python" for "python3 -m venv masuite"
cd /path/to/masuite/
python3 -m venv masuite
source masuite/bin/activate
If you are simply going to run masuite experiments, or use already implemented tools, install using:
pip install .
Alternatively, if you are planning on editing/developing masuite source code, it is recommended to install in development mode using:
python3 setup.py develop
To confirm the installation was successful run:
pytest --disable-warnings
Do not worry about the warnings that are encountered as they come from the gym dependency.
(Note: this is a WIP feature that only tests that the package is installed and some basic tests can be run. More verbose testing still needs to be written).
Agents are responsible for learning an action policy and choosing actions given a set of observations. Each agent instance must be passed the following parameters on creation:
env_dim list(int)
- defines the shape of the observations the agent will be passed when choosing an action.act_dim list(int)
- defines the shape of the actions the agent will return.
In addition, an optional lr
parameter can be passed specifying the learning rate for the agent when updating its action policy.
Each agent must also have two public functions:
select_action(self, obs)
- chooses what actions to take based on the current action policy and the passed observations.update(self, grad)
- makes a single action policy update based ongrad
and the learning rate.
Masuite has three main modules that when combined create an experiment. The three modules are:
- Environments: The environment agents interact with.
- Agents - Responsible for choosing actions to pass to the environment based on some action policy.
- Algos - Responsible for choosing updates for agents' action policies and passing these updates to the agent(s).
Additionally, experiements are generally initialized with logging to track progress and checkpoints.
Experiements are usually run from the command line. When this is the case, the used algorithm's run.py
file is called with various (optional) command args. The main argument you should be aware of is the masuite-id
. This specifies the name of the experiment to run as well as which configuration to run. The general form is experiment_name/config_num
. The configurations (and their corresponding config_num
) can be found in experiments/experiment_name/sweep.py
.
To run the single-player simple pg cartpole experiment run:
python3 masuite/algos/pytorch/simplepg/run.py --masuite-id=cartpole_simplepg/0
Additional optional command line args can be found in run.py
. If 'python3' doesn't work try 'python'
To run the two-player simple pg smallsoccer experiment run:
python3 masuite/experiments/smallsoccer_gridsearch.py
After training agents and extracting their checkpoint information, you can compare the quality of agents in a tournament game.
To run a tournament, first activate Masuite, then type the code below into the terminal.
python3 masuite/analysis/tournament_demo.py masuite/analysis/soccer_checkpoints out.pkl
- "Python3" is to open the python file
- "masuite/analysis/tournament_demo.py" is the file that has the tournament information in it.
- "masuite/analysis/soccer_checkpoints" is the folder where the checkpoints of the agents are in
- "out.pkl" is the output pickle file name that can be opened in python and get the info from the tournament.
import masuite
env = masuite.load_from_id('env_id')
Example run loop for 2 agents (no state)
def sample(env, agent1, agent2):
x = agent1.act()
y = agent2.act()
fx, fy = env.step((x,y))
return fx, fy
Example run loop for 2 agents in an environment (with state)
def sample(env, agent1, agent2)
rets = []
for _ in range(env.num_episodes):
obs = env.reset() # resets environment to initial state
while not done:
act1 = agent1.act(obs)
act2 = agent2.act(obs)
obs, rews, done, info = env.step((act1, act2))
rets.append(rews)
return rets
import masuite
alg = masuite.load_alg('simgrad')
import masuite
alg = masuite.load_alg_and_record('stackgrad/reg', results_dir='/path/to/results')
For simultaneous gradient descent
def train(x, y):
fx, fy = sample(x, y)
gx = alg.grad(fx, x)
gy = alg.grad(fy, y)
info = alg.step((gx, gy))
return info
For stackelberg gradient descent
def train(x, y):
fx, fy = sample(x, y)
gy = alg.grad(fy, y)
gx = alg.grad(fx, x, gy)
info = alg.step((gx, gy))
return info
We include implementations of several common agents in the baselines
directory
Environments model cost/reward functions and state transitions.
- Matrix game (numpy, jax)
- Bimatrix game (numpy, jax)
- Quadratic game (numpy, jax, pytorch)
- Linear dynamics quadratic game (numpy, jax, pytorch)
- Multi-agent optimal control (cvxpy, julia)
Agents hold parameters and decision-making logic. They can have limited or full knowledge of the system.
- Actor: policy gradient
- Implicit actor: stackelberg gradient
- Cricic: stackelberg actor-critic
Algorithms combine agents by transfering necessary information to one another (if they need to coordinate)
- Simultaneous (using optimizers)
- Stackelberg (using optimizers)
- Symplectic gradient adjustment
Experiments combine environemnts, agents and algorithms together.
- Run stackeblerg update in quadratic games.
- Run policy gradient in linear quadratic games.
- Run policy gradient in markov games.
- Run stackelberg policy gradient in linear quadratic games.
- Create
Environment
instance - Create
Logger
instance - Create
Agent
instance(s) - Create
Algo
instance - Pass to
experiment.run()
for epoch in range(num_epochs):
obs <- env.reset()
while True:
acts <- agent.select_action()
obs, rews, done <- env.step(acts)
batch_info <- alg.update(obs, acts, rews, done)
if batch is over:
break