DiSProD is an online planner developed for environments with probabilistic transitions with continuous action spaces. DiSProD builds a symbolic graph that captures the distribution of future trajectories, conditioned on a given policy, using independence assumptions and approximate propagation of distributions. The symbolic graph provides a differentiable representation of the policy’s value, enabling efficient gradient-based optimization for long-horizon search. The propagation of approximate distributions can be seen as an aggregation of many trajectories, making it well-suited for dealing with sparse rewards and stochastic environments. While most of our experiments are with continuous state spaces, we can work with hybrid state spaces as well. We also provide an implementation for a version of DiSProD that works with binary action spaces, but it has not been tested extensively.
The code has been tested on Python 3.8.10 using Ubuntu 20.04.
git clone git@github.com:pecey/DiSProD.git
cd DiSProD
pip install -r requirements.txt
We assume that an environment variable called DISPROD_PATH
is set and points to this folder. Please set that before running any of the scripts.
To run the planner on an environment:
python run_gym.py --alg=<alg_name> --env=<env_code>
alg_name
can be one of the following: disprod
, cem
, mppi
, while env_code
can be one of the following codes. As shooting algorithms are configured for discrete environments, running cem
or mppi
for (Discrete) Cartpole cp
or (Discrete) Mountain Car mc
will raise an Exception.
Environment | env_code |
---|---|
CartPole | cp |
Mountain Car | mc |
Continuous CartPole | ccp |
Continuous Mountain Car | cmc |
Pendulum | p |
Continuous Dubins Car | cdc |
Continuous Mountain Car - Sparse Rewards | cmc_sp |
Continuous Mountain Car - High Dimension | cmc_hd |
Continuous CartPole - Hybrid | ccp_h |
Simple Env | se |
The other configurations available via CLI are:
--seed
: Specify base seed to PRNG.--depth
: Planning horizon for the planner.--alpha
: Controls noise levels in the environment. For no noise, use--alpha=0
.--reward_sparsity
: Controls the sparsity of rewards in the planner. Higher the value, sparser the reward.--n_episodes
: Number of episodes to run the planner for.--map_name
: Specify the map name to use for Dubins Car environments. Valid values are listed below--n_samples
: Specify the population size for CEM/MPPI.--step_size
: Specify the step size for action mean updates in DiSProD.--step_size_var
: Specify the step size for action variance updates in DiSProD.--n_restarts
: Specify the number of restarts in DiSProD.--taylor_expansion_mode
: Control which mode of DiSProD to use. Valid values arecomplete
,state_var
andno_var
.--n_actions
: Specify the number of actions for Continuous Mountain Car - High Dimension.--render
: Control whether to render the Gym interactions.
All the map configurations are defined in env/assets/dubins.json. At present, the available maps are no-ob-[1-5]
, ob-[1-11]
, u
and cave-mini
.
The scripts for running the experiments are in slurm_scripts
and are geared towards clusters being managed by SLURM. The folder has bash scripts for all the experiments which in-turn submit the respective jobs to SLURM. The scripts take the name of the environment as arguments. Valid values are cartpole
, pendulum
, mountain_car
, dubins_car
and simple_env
.
For example, to trigger the experiments that vary the noise levels on mountain car environment,
./trigger_exp_noise.sh mountain_car
The equivalent python
command for running the same experiment is:
PYTHONPATH=. python run_gym.py --env=cmc --alg=${alg} --n_episodes=48 --alpha=${alpha_val} --render=False
where alg
is one of disprod
, cem
or mppi
and alpha_val
controls the amount of noise.
config
contains domain-specific configuration used by the planners.config/default.yaml
contains the default config which can be overwritten by using the same key in the domain-specific file.env
contains the environment files for each of the domains.env/transition_fns.py
andenv/reward_fns.py
contains the transition functions and the reward functions for each of the domains. These functions are provided to the planner via the config files.planners
contains the planning algorithms.continuous_disprod
is for continuous action spaces and can handle both continuous and binary state variables.discrete_disprod
is for binary action spaces.planners/baseline
contains the code for CEM and MPPI.utils
contains helper methods that are used in various places.
Using DiSProD on a new environment is straightforward.
- Have a JAX-compatible transition function with all noises represented as inputs.
- Have a JAX-compatible reward function.
- Since we register the environments ourselves with Gym, a copy of the environment file would also be required. This step can be skipped but would require manually commenting out portions of the code.
- Add a configuration file in
config
with atleast the following keys: -env_name
: The name to register with Gym. -env_entry_point
: Path to the simulator file. -max_episode_steps
: Max episodes steps permissible. -reward_fn
: Path to the JAX-compatible reward function. -transition_fn
: Path to the JAX-compatible transition function. -discrete
: True if the action space is discrete. This is used to call the relevant version of DiSProD while planning. - Add a entry in
ENV_MAPPING
inrun_gym.py
mapping the environment code-name that you want with the name of the config file. - Add the environment code name as a valid choice for
--env
argument for Argparse inrun_gym.py
- Instructions for experiments with TurtleBot can be found in ros1-turtlebot/README.md
As the main loop to run experiments uses multiprocessing, if the inner loop fails, the outer loop still tries to execute and errors out with a dimensionality issue for the reward matrix. The error message looks something like:
Traceback (most recent call last):
File "run_gym.py", line 171, in <module>
main(args)
File "run_gym.py", line 122, in main
rewards = rewards_matrix[:, 1]
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
This message can be ignored as it is not the true source of error. The actual error would be found above this message in the stacktrace.
If you find this work useful, please cite
@misc{chatterjee2023disprod,
title={DiSProD: Differentiable Symbolic Propagation of Distributions for Planning},
author={Palash Chatterjee and Ashutosh Chapagain and Weizhe Chen and Roni Khardon},
year={2023},
eprint={2302.01491},
archivePrefix={arXiv},
primaryClass={cs.RO}
}