/VoxPoser

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

Primary LanguagePythonMIT LicenseMIT

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

Wenlong Huang1, Chen Wang1, Ruohan Zhang1, Yunzhu Li1,2, Jiajun Wu1, Li Fei-Fei1

1Stanford University, 2University of Illinois Urbana-Champaign

This is the official demo code for VoxPoser, a method that uses large language models and vision-language models to zero-shot synthesize trajectories for manipulation tasks.

In this repo, we provide the implementation of VoxPoser in RLBench as its task diversity best resembles our real-world setup. Note that VoxPoser is a zero-shot method that does not require any training data. Therefore, the main purpose of this repo is to provide a demo implementation rather than an evaluation benchmark.

If you find this work useful in your research, please cite using the following BibTeX:

@article{huang2023voxposer,
      title={VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models},
      author={Huang, Wenlong and Wang, Chen and Zhang, Ruohan and Li, Yunzhu and Wu, Jiajun and Fei-Fei, Li},
      journal={arXiv preprint arXiv:2307.05973},
      year={2023}
    }

Setup Instructions

Note that this codebase is best run with a display. For running in headless mode, refer to the instructions in RLBench.

  • Create a conda environment:
conda create -n voxposer-env python=3.9
conda activate voxposer-env
  • See Instructions to install PyRep and RLBench (Note: install these inside the created conda environment).

  • Install other dependencies:

pip install -r requirements.txt
  • Obtain an OpenAI API key, and put it inside the first cell of the demo notebook.

Running Demo

Demo code is at src/playground.ipynb. Instructions can be found in the notebook.

Code Structure

Core to VoxPoser:

  • playground.ipynb: Playground for VoxPoser.
  • LMP.py: Implementation of Language Model Programs (LMPs) that recursively generates code to decompose instructions and compose value maps for each sub-task.
  • interfaces.py: Interface that provides necessary APIs for language models (i.e., LMPs) to operate in voxel space and to invoke motion planner.
  • planners.py: Implementation of a greedy planner that plans a trajectory (represented as a series of waypoints) for an entity/movable given a value map.
  • controllers.py: Given a waypoint for an entity/movable, the controller applies (a series of) robot actions to achieve the waypoint.
  • dynamics_models.py: Environment dynamics model for the case where entity/movable is an object or object part. This is used in controllers.py to perform MPC.
  • prompts/rlbench: Prompts used by the different Language Model Programs (LMPs) in VoxPoser.

Environment and utilities:

  • envs:
    • rlbench_env.py: Wrapper of RLBench env to expose useful functions for VoxPoser.
    • task_object_names.json: Mapping of object names exposed to VoxPoser and their corresponding scene object names for each individual task.
  • configs/rlbench_config.yaml: Config file for all the involved modules in RLBench environment.
  • arguments.py: Argument parser for the config file.
  • LLM_cache.py: Caching of language model outputs that writes to disk to save cost and time.
  • utils.py: Utility functions.
  • visualizers.py: A Plotly-based visualizer for value maps and planned trajectories.

Acknowledgments

  • Environment is based on RLBench.
  • Implementation of Language Model Programs (LMPs) is based on Code as Policies.
  • Some code snippets are from Where2Act.
  • Additional acknowledgement to GitHub Copilot and GPT-4 for collaboratively writing a significant portion of the code in this codebase.