PDDM

[Project Page] [Paper]

Deep Dynamics Models for Learning Dexterous Manipulation
Anusha Nagabandi, Kurt Konolige, Sergey Levine, Vikash Kumar.

Please note that this is research code, and as such, is still under construction. This code implements the model-based RL algorithm presented in PDDM. Please contact Anusha Nagabandi for questions or concerns.

Contents of this README:

A. Getting Started
B. Quick Overview
C. Train and visualize some tests
D. Run experiments

A. Getting started

3) Setup this repo:

# create and activate new virtual environment
python3 -m venv ~/venv/pddm
source ~/venv/pddm/bin/activate

# install required packages
pip install -r requirements.txt
# install this package
pip install -e .

As there was an error with the protobuf version when setting up the conda environment on Ubuntu 22.04, I changed it to use the python venv instead.

B. Quick Overview

The overall procedure that is implemented in this code is the iterative process of learning a dynamics model and then running an MPC controller which uses that model to perform action selection. The code starts by initializing a dataset of randomly collected rollouts (i.e., collected with a random policy), and then iteratively (a) training a model on the dataset and (b) collecting rollouts (using MPC with that model) and aggregating them into the dataset.

The process of (model training + rollout collection) serves as a single iteration in this code. In other words, the rollouts from iter 0 are the result of planning under a model which was trained on randomly collected data, and the model saved at iter 3 is one that has been trained 4 times (on random data at iter 0, and on on-policy data for iters 1,2,3).

To see available parameters to set, see the files in the configs folder, as well as the list of parameters in convert_to_parser_args.py.

C. Train and visualize some tests

Cheetah:

python train.py --config ../config/short_cheetah_test.txt --output_dir ../output
python visualize_iteration.py --job_path ../output/short_cheetah_test --iter_num 0

Ant:

python train.py --config ../config/short_ant_test.txt --output_dir ../output --use_gpu
MJPL python visualize_iteration.py --job_path ../output/short_ant_test --iter_num 0

Dclaw turn valve:
Note that this will not actually quite work, but might be reasonable.

python train.py --config ../config/short_dclaw_turn_test.txt --output_dir ../output --use_gpu
MJPL python visualize_iteration.py --job_path ../output/short_dclaw_turn_test --iter_num 0

Dclaw turn valve:
Note that this will work well but also take a while to run, because it's using ground-truth Mujoco dynamics for planning. It should take approximately 6 minutes on a standard laptop without any GPU.

python train.py --config ../config/test_dclaw_turn_gt.txt --output_dir ../output --use_gpu
MJPL python visualize_iteration.py --job_path ../output/dclaw_turn_gt --iter_num 0

Shadowhand in-hand cube rotation:
Note that this will work well but also take a while to run, because it's using ground-truth Mujoco dynamics for planning. It should take approximately 6 minutes on a standard laptop without any GPU.

python train.py --config ../config/test_cube_gt.txt --output_dir ../output --use_gpu
MJPL python visualize_iteration.py --job_path ../output/cube_gt --iter_num 0

Shadowhand Baoding balls:
Note that this will work well but also take a while to run, because it's using ground-truth Mujoco dynamics for planning. It should take approximately 20 minutes on a standard laptop without any GPU.

python train.py --config ../config/test_baoding_gt.txt --output_dir ../output --use_gpu
MJPL python visualize_iteration.py --job_path ../output/baoding_gt --iter_num 0

D. Run experiments

Train:

python train.py --config ../config/dclaw_turn.txt --output_dir ../output --use_gpu
python train.py --config ../config/baoding.txt --output_dir ../output --use_gpu
python train.py --config ../config/cube.txt --output_dir ../output --use_gpu

Evaluate a pre-trained model:

python eval_iteration.py --job_path ../output/dclaw_turn --iter_num 0 --num_eval_rollouts 1 --eval_run_length 40

Visualize:

MJPL python visualize_iteration.py --job_path ../output/dclaw_turn --eval
MJPL python visualize_iteration.py --job_path ../output/dclaw_turn --iter_num 0

Compare runs:

Plot rewards (or scores) of multiple runs on the same plot. Note that custom labels are optional: