🚀 Exciting News! We have newly added support for ALFRED and the Tidy Task! This major update allows users to run HELPER and HELPER-X on these additional benchmarks. See the alfred and tidy_task branches for more information. Dialfred coming soon!
This repo contains code and data for running HELPER and HELPER-X. This branch is for running HELPER on TEACh.
This branch is for running HELPER and HELPER-X on TEACh.
Please see the alfred branch for instructions on how to run HELPER on ALFRED.
Please see the tidy_task branch for instructions on how to run HELPER on Tidy Task. (coming soon)
Please see the 'dialfred' branch for instructions on how to run HELPER on Dialfred. (coming soon)
(1) Start by cloning the repository:
git clone https://github.com/Gabesarch/HELPER.git
cd HELPER
(1a) (optional) If you are using conda, create an environment:
conda create -n helper python=3.8
(2) Install PyTorch with the CUDA version you have. We have tested with PyTorch 1.10 and CUDA 11.1:
pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
(3) Install additional requirements:
pip install setuptools==59.8.0 numpy==1.23.1 # needed for scikit-image
pip install -r requirements.txt
(4) Install Detectron2 (needed for SOLQ detector) with correct PyTorch and CUDA version.
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
(5) Install teach:
pip install -e teach
(6) Build SOLQ deformable attention:
cd ./SOLQ/models/ops && sh make.sh && cd ../../..
(7) Clone ZoeDepth repo
git clone https://github.com/isl-org/ZoeDepth.git
cd ZoeDepth
git checkout edb6daf45458569e24f50250ef1ed08c015f17a7
- Download the TEACh dataset following the instructions in the TEACh repo
teach_download
TO run our model with estimated depth and segmentation, download the SOLQ and ZoeDepth checkpoints:
- Download SOLQ checkpoint: here. Place it in the
./checkpoints
folder (or anywhere you want and specify the path with--solq_checkpoint
). Alternatively, you can download the file with gdown (pip install gdown
):
cd checkpoints
gdown 1hTCtTuygPCJnhAkGeVPzWGHiY3PHNE2j
- Download ZoeDepth checkpoint: here. Place it in the
./checkpoints
folder (or anywhere you want and specify the path with--zoedepth_checkpoint
). (Also make sure you clone the ZoeDepth repo:git clone https://github.com/isl-org/ZoeDepth.git
) Alternatively, you can download the file with gdown (pip install gdown
):
cd checkpoints
gdown 1gMe8_5PzaNKWLT5OP-9KKEYhbNxRjk9F
- (if required) Start x server. if an X server is not already running on your machine. First, open a screen with the desired node, and run the following to open an x server on that node:
python startx.py 0
Specify the server port number with the argument --server_port
(default 0).
- Set OpenAI keys. Set Azure keys:
export AZURE_OPENAI_KEY={KEY}
export AZURE_OPENAI_ENDPOINT={ENDPOINT}
(If not using Azure)
Important! If using openai API, append --use_openai
to arguments. Then set openai key:
export OPENAI_API_KEY={KEY}
- Run agent. To run the agent with all modules and estimated perception on TfD validation unseen, run the following:
python main.py \
--mode teach_eval_tfd \
--split valid_unseen \
--gpt_embedding_dir ./data/gpt_embeddings \
--teach_data_dir PATH_TO_TEACH_DATASET \
--server_port X_SERVER_PORT_HERE \
--episode_in_try_except \
--use_llm_search \
--use_constraint_check \
--run_error_correction_llm \
--zoedepth_checkpoint ./checkpoints/ZOEDEPTH-model-00015000.pth \
--solq_checkpoint ./checkpoints/SOLQ-model-00023000.pth \
--set_name HELPER_teach_tfd_validunseen
Change split to --split valid_seen
to evaluate validation seen set.
All metrics will be saved to ./output/metrics/{set_name}
. Metrics and videos will also automatically be logged to wandb.
To create movies of the agent, append --create_movie
to the arguments. This will by default create a movie for every episode rendered to ./output/movies
. To change the episode frequency of logging, alter --log_every
(e.g., --log_every 10
to render videos every 10 episodes). To remove the map visualization, append --remove_map_vis
to the arguments. This can speed up the episode since rendering the map visual can slow down episodes.
The following arguments can be removed to run the ablations:
- Remove memory augmented prompting. Add argument
--ablate_example_retrieval
. - Remove LLM search (locator) (only random). Remove
--use_llm_search
. - Remove constraint check (inspector). Remove
--use_constraint_check
. - Remove error correction (rectifier). Remove
--run_error_correction_llm
. - Change openai model type. Change
--openai_model
argument (e.g.,--openai_model gpt-3.5-turbo
).
The following arguments can be added to run with ground truth:
- GT depth
--use_gt_depth
. Reccomended to also add--increased_explore
with estimated segmentation for best performance. - GT segmentation
--use_gt_seg
. - GT action success
--use_gt_success_checker
. - GT error feedback
--use_GT_error_feedback
. - GT constraint check using controller metadata
--use_GT_constraint_checks
. - Increase max API fails
--max_api_fails {MAX_FAILS}
.
To run with user feedback, add --use_progress_check
. Two additional metric files (for feedback query 1 & 2) will be saved to ./output/metrics/{set_name}
.
See the teach_edh
branch for how to run the TEACh EDH evaluation.
This project utilizes several repositories and tools. Below are the references to these repositories:
- TEACh: TEACh (Task-driven Embodied Agents that Chat) is a dataset and benchmark for training and evaluating embodied agents in interactive environments.
- ALFRED: ALFRED (Action Learning From Realistic Environments and Directives) is a benchmark for learning from natural language instructions in simulated environments.
- TIDEE: TIDEE (Task-Informed Dialogue Embodied Environment) is a framework for training embodied agents in task-oriented dialogue settings.
- SOLQ: SOLQ (Segmenting Objects by Learning Queries) is a method for object detection and segmentation.
- ZoeDepth: ZoeDepth is a repository for depth estimation models.
Please refer to these repositories for more detailed information and instructions on their usage.
If you like this paper, please cite us:
@inproceedings{sarch2023helper,
title = "Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models",
author = "Sarch, Gabriel and
Wu, Yue and
Tarr, Michael and
Fragkiadaki, Katerina",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
year = "2023"}
@inproceedings{sarch2024helperx,
title = "HELPER-X: A Unified Instructable Embodied Agent to Tackle Four Interactive Vision-Language Domains with Memory-Augmented
Language Models",
author = "Sarch, Gabriel and Somani, Sahil and Kapoor, Raghav and Tarr, Michael J and Fragkiadaki, Katerina",
booktitle = "ICLR 2024 LLMAgents Workshop",
year = "2024"}