Imitation Learning with the Interaction Dataset via the InteractionSimulator gym environments.
Code for "SHAIL: Safety-Aware Hierarchical Adversarial Imitation Learning for Autonomous Driving in Urban Environments". If you find this repository useful, please cite the paper:
@article{jamgochian2022shail,
author = {Arec Jamgochian and Etienne Buehrle and Johannes Fischer and Mykel J. Kochenderfer},
title = {{SHAIL}: Safety-Aware Hierarchical Adversarial Imitation Learning for Autonomous Driving in Urban Environments},
journal = {arXiv:2204.01922 [cs]},
year = {2022}
}
Clone the InteractionSimulator
with the shail
tag and pip install the module.
git clone --branch shail https://github.com/sisl/InteractionSimulator.git
cd InteractionSimulator
pip install -e .
cd ..
export PYTHONPATH=$(pwd):$PYTHONPATH
Install additional requirements
pip install -r requirements.txt
Copy INTERACTION Dataset files:
The INTERACTION dataset contains a two folders which should be copied into a folder called ./InteractionSimulator/datasets
:
- the contents of
recorded_trackfiles
should be copied to./InteractionSimulator/datasets/trackfiles
- the contents of
maps
should be copied to./InteractionSimulator/datasets/maps
Once the repository has been set up, you need to generate two separate sets of expert demos for tracks 0-4. The first command generates true joint and individual states and actions necessary for evaluating, saving them in expert_data/
. The second command generates trajectory rollouts according to individual agent observations, which is later used as expert data for the learning models.
python -m src.expert --locs='[DR_USA_Roundabout_FT]' --tracks='[0,1,2,3,4]'
python -m intersimple-expert-rollout-setobs2 --tracks='[0,1,2,3,4]'
To tune models, we use ray[tune]
grid searches. You can run see the commands we used to train in the top half of train_models.sh
, as well as the hyperparameters we search over in bc-experiment.py
, gail-experiment.py
, and shail-experiment.py
. After training the models, configurations get saved in best_configs/
(the best SHAIL confg gets copied to a HAIL config, with the appropriate environment parameters changed for ablation). However, upon manual inspection of the training runs, we note some better performance than the automatically-set configs at earlier epochs, so we adjust the best_configs
manually.
After the best_configs/
are set, we rerun each configuration with multiple seeds. The commands to do so are in the bottom half of train_models.sh
. This saves different learned policy files to test_policies/
.
To evaluate the learned policies, we rerun each model in particular setting, evaluate all our metrics, and average over different trained model seeds. The commands to do so are in evaluate_models.sh
.
InteractionImitation
|- TODO
Demo: List[Trajectory]
Trajectory: List[Tuple[Observation, Action]] # single expert
Observation: Dict[
'own_state': [x, y, v, psi, psidot],
'relative_states': List[[xr, yr, vr, psir, psidotr]],
'own_path': List[[xr, yr]], # fixed length, constant dt
'map': Map, # relative
]
Action: Range[0, 1]
Policy: Union[
Callable[[Observation], Action],
Callable[[Observation, Action], probability],
]
Discriminator: Callable[[Observation, Action], value]
Map: Dictionary[...]