Proc2PDDL is a dataset of paired open-domain procedural texts and PDDL representations.
The cleaned and processed Proc2PDDL dataset can be found at /pddl_data
. Though likely not to be used, the raw dataset produced by the annotators can be found at /pddl_data_raw
.
In /pddl_data
, there are 27 folders named by the domain ID each containing:
- a procedural article from wikihow (
wikihow-*.txt
) containing some steps - an annotated domain file (
domain.pddl
) - the header, including types and predicates but excluding the names of actions, of the above domain file (
domain_header.pddl
) - the name and description of actions of the above domain file (
actions.txt
) - an annotated mapping between each action and a step from the procedural article (
action_step_map.txt
), only used in certain settings - a folder of annotated problem files (
problem-*.pddl
)
While the above dataset can enable many possible tasks, we exemplify the task of action modeling, where:
- the input is the domain file header (types, predicates, and names of actions)
- the output is the definitions of actions (parameters, preconditions, and effects)
Code for runninng model prediction with OpenAI API is in /predict_scripts
.
Then run:
python predict_actions.py --model MODEL --prompt PROMPT [--cot]
where
MODEL
is either gpt3.5 (gpt-3.5-turbo-16k
) or gpt4 (gpt-4-32k
)PROMPT
is defaulted towhole
, meaning using all the text from the procedural article, but can also be set topair
including only relevant paired steps and the actions, orno_text
including no text at all as an ablation--cot
can be specified to use chain-of-thought prompting
The output is at pddl_evaluation/pred/MODEL_PROMPT_[COT]/DOMAIN_ID.txt
, eachc containing the definitions of actions (parameters, preconditions, and effects).
Code for evaluating model prediction is in /evaluate_scripts
.
Then run
python evaluate.py --model MODEL --prompt PROMPT [--cot]
The arguments are the same as before. This attempts to solve each problem with the generated domain file. In the process, two files are created:
- In
pddl_evaluation/plan/
, the predicted plan, if any, is stored - In
pddl_evaluation/pred/
, the action objects are stored as pickle
Next, running
python compare_plan.py --model MODEL --prompt PROMPT
prints to stdout the number of predicted plans that exactly match the gold plan.
python compare_actions.py --model MODEL --prompt PROMPT
prints to stdout the accuracy of actions, parameters, preconditions, and effects.
If you use our work, please cite (TODO)