/ViLaIn

An official implementation of Vision-Language Interpreter (ViLaIn)

Primary LanguageSASMIT LicenseMIT

[Website]  [Paper]  [Code]

ViLaIn

An official implementation of Vision-Language Interpreter (ViLaIn). See our paper for more details.

Requirements

The ProDG dataset

data contains PDDL files, observations, and instructions for three domains, which we denoted the ProDG dataset in the paper. This directory also contains annotated bounding boxes in annotated_bboxes. The directory structure is as follows:

data
 └─domains
    └─domain.pddl                   (A PDDL domain file)
    └─problems                      (PDDL problem files)
       └─problem*.pddl
    └─observations                  (Observations for the initial state)
       └─problem*.jpg
    └─instructions                  (Linguistic instructions)
       └─problem*.txt
    └─annotated_bboxes              (Annotated bounding boxes)
       └─problem*.json

Results

results/reported_results contains the generated PDDL problems and found plans reported in the paper. In the directory, there are also three subdirectories for each domain:

  • plain: the results without corrective reprompting
  • refine_once: the results by applying corrective reprompting for the problems in plain
  • refine_twice: the results by applying corrective reprompting for the problems in refine_once

Getting Started

Detecting Objects and Generating Captions

To detect objects with bounding boxes and generate captions, run:

export domain=cooking
export grounding_dino_dir=./GroundingDINO
export result_dir=./results/temp/${domain}

python scripts/main.py \
    --data_dir "./data/${domain}" \
    --result_dir ${result_dir} \
    --grounding_dino_dir ${grounding_dino_dir} \
    --predict_bboxes

This step should be done prior to PDDL problem generation.

Generating PDDL Problems and Finding Plans

To generate PDDL problems based on the predicted bounding boxes and captions and find plans, run:

export domain=cooking
export downward_dir=./downward
export result_dir=./results/temp/${domain}
export num_repeat=2
export num_examples=3

python scripts/main.py \
    --downward_dir ${downward_dir} \
    --data_dir "./data/${domain}" \
    --result_dir "${result_dir}" \
    --num_repeat ${num_repeat} \
    --num_examples ${num_examples} \
    --gen_step "plain" \
    --generate_problem \
    --find_plan

Evaluating Generated PDDL Problems and Found Plans

To evaluate the generated PDDL problems and validate the found plans, run:

export domain=cooking
export downward_dir=./downward
export result_dir=./results/temp/${domain}
export num_repeat=2

python scripts/evaluate.py \
    --downward_dir ${downward_dir} \
    --data_dir "./data/${domain}" \
    --result_dir "${result_dir}" \
    --num_repeat ${num_repeat} \
    --gen_step "plain"

Refining Generated PDDL Problems

To refine the generated PDDL problems by corrective reprompting, run:

export domain=cooking
export downward_dir=./downward
export result_dir=./results/temp/${domain}
export num_repeat=2

python scripts/main.py \
    --downward_dir ${downward_dir} \
    --data_dir "./data/${domain}" \
    --result_dir "${result_dir}" \
    --num_repeat ${num_repeat} \
    --gen_step "refine_once" \
    --prev_gen_step "plain" \
    --refine_problem \
    --use_cot \
    --find_plan

Citation

@misc{shirai2023visionlanguage,
      title={Vision-Language Interpreter for Robot Task Planning}, 
      author={Keisuke Shirai and Cristian C. Beltran-Hernandez and Masashi Hamaya and Atsushi Hashimoto and Shohei Tanaka and Kento Kawaharazuka and Kazutoshi Tanaka and Yoshitaka Ushiku and Shinsuke Mori},
      year={2023},
      eprint={2311.00967},
      archivePrefix={arXiv},
      primaryClass={cs.RO}
}