This repo is for the project of Stackelberg learning for collaborative assembly task planning.
- Python 3.10 or higher
- Pytorch 2.0.1 or higher
- Create a Python virtual environment with Python 3.11 and source the virtual environment:
$ python3.11 -m venv <your-virtual-env-name>
$ source /path-to-venv/bin/activate
- Use
pip
to install related packages:
(your-venv)$ pip install -e .
To use plotting functions, install with
(your-venv)$ pip install -e ".[visual]"
- Go to the
experiments/
directory and run different training scripts. e.g.,
(your-venv)$ python sg_train.py
Note: sg_perturb.py
and plot_things.py
should be run after all training are completed.
sg_task/
: algorithm implementationsdata/
: environment settings and learning hyperparameters.sg/
: Stackelberg learning functions.other/
: nash and independent learning functions.env.py
: environment implementations.perturbation.py
: perturbation test with Stackelberg learning models.utils.py
: miscellaneous utilities.
data/
: data directory for saving generated data and learned models.experiments/
: Python scripts for running the experiments.tests/
: test scripts.scripts/
: bash scripts to run the code.
nash
: Nash Q-learning algorithm.ind
: Independent learning algorithm.maddpg
: Multi-Agent Deep Deterministic Policy Gradient, see maddpg
The hyperparameters and environment configurations are all in the sg_task/data/
director. New customized tasks can be freely added by following the structure of Task 1-8.
We use the multiprocessing
package to run the training of a specific task over different experiments in parallel. See code in the experiments/
directory. The training seeds and devices can be set manually for different experiments.
We use bash
to run the training of different tasks in parallel.
To enable command-line options, uncomment the following statement in every training script in the experiments/
directory:
task_id = int(sys.argv[1]) # uncomment this to use bash script.
- action: use list.
Al = Af = [-1, 0, 1,..., n-1]
, -1 means do nothing, the board width is n - buffer: use 2D numpy array.
D[i, :] = [s, al, af, rl, rf, s_new]
- Q-function in Stackelberg learning:
dims -> Al x Af
. For output vector, we use order[(al_0,af_0), (al_0,af_1), ..., (al_0,af_m), ..., (al_m,af_m)]