Pytorch official implementation for Continuous Imitation Learning from Observation from Explorative Imitation Learning: A Path Signature Approach for Continuous Environments (ECAI).
Python: 3.9.15
Conda: 23.1.0
There is a script in ./dependencies/install.sh
, which will create a conda environment and install all dependencies needed to run this repository.
To run CILO you need to first create random transition samples (
To create random samples for one environment:
python create_random_mujoco.py --env_name <ENV> --data_path <PATH>
For example:
python create_random_mujoco.py --env_name Ant-v3 --data_path ./dataset/ant/random_ant
To create random samples for all environments:
bash ./scripts/create_randoms.sh
To create expert samples for one environment:
python create_dataset_mujoco.py -t <THREADS> -e <EPISODES> -g <ENV> --mode <play|collate|all>
For example:
python create_dataset_mujoco.py -t 4 -e 10 -g ant --mode all
To create expert samples for all environments:
bash ./scripts/create_experts.sh
If you want to use the same datasets from the paper they are all publicly available via IL-Datasets. All datasets are listed on HuggingFace and can be downloaded using BaselineDataset from IL-Datasets. To use the dataset:
from imitation_datasets.dataset import BaselineDataset
dataset = BaselineDataset("NathanGavenski/Ant-v2", source="huggingface")
To run CILO you can run the command:
clear && python train_cilo.py \
--gpu <GPU> \
--encoder vector \
--env_name <ENV> \
--run_name <RUN NAME> \
--data_path <RANDOM> \
--expert_path <EXPERT> \
--alpha <ALPHA> \
--domain vector \
--choice explore \
\
--lr <Dynamics LR> \
--lr_decay_rate <LR DECAY> \
--batch_size <BATCH SIZE> \
--idm_epochs <EPOCHS> \
\
--policy_lr <Policy LR> \
--policy_lr_decay_rate <LR DECAY> \
--policy_batch_size <BATCH SIZE> \
\
--verbose
where <GPU>
should be -1
if there are no GPUs available, <RANDOM>
is the path for the random samples, <EXPERT>
is the path for the expert samples, and the <RUN NAME>
is the name you want for you experiment in the tensorboard.
For simplicity, we provide a script for each environment with all hyperparameters used during training. To use them:
bash ./scripts/cilo/cilo_ant.sh -1 experiment1
where the first argument is the gpu number and the second is the experiment name.
@incollection{gavenski2024explorative,
title={Explorative Imitation Learning: A Path Signature Approach for Continuous Environments},
author={Gavenski, Nathan and Monteiro, Juarez and Meneguzzi, Felipe and Luck, Michael and Rodrigues, Odinaldo},
booktitle={ECAI 2024},
pages={1551-1558}
year={2024},
publisher={IOS Press}
}