
Implements off-policy models from:

This project can also interface with the robot & environment provided here, which is shown running in the image below.

Still a slight WIP; more details and instructions to come.


Note: Must be running Linux due to dependence on Ray package for parallel policy execution.

conda create -n deepq python=3.6
pip install numpy ray gym pybullet psutil

And then follow the commmand from here to install the appropriate version of PyTorch 1.0.

To Run:

This project only implements the offline grasping approach, meaning we will first collect experience offline, and then index this as an experience replay buffer that doesn't change.

Note: The command line can be used to specify a number of additional arguments. See for details.

Collect Experience

First collect experience using a biased downward policy. The following command will spawn N remote servers that each run a different environment instance, and are used to collect experience in parallel.

python --remotes=1 --outdir=data100K

Train a Model

Once data has been collected, you can begin training off-policy DQL models by selecting one from the list:

python --remotes=1 --data-dir=data100K --model=[dqn, ddqn, ddpg, supervised, mcre, cmcre]

If running a visdom server, you can replace with to watch task execution.


Thanks to Eric Jang for model discussion, and the Ray team for helping to debug.